Add multi-host copy support to build-and-copy.sh
Updated build-and-copy.sh to support copying Docker images to multiple hosts using the new -c/--copy-to flag, which accepts space- or comma-separated host lists. The old --copy-to-host flag is retained as an alias for backward compatibility, and -h is now used for help. The README was updated to document these changes and provide new usage examples.
This commit is contained in:
36
README.md
36
README.md
@@ -12,6 +12,12 @@ The Dockerfile builds from the main branch of VLLM, so depending on when you run
|
||||
|
||||
## CHANGELOG
|
||||
|
||||
### 2025-12-18
|
||||
|
||||
Updated `build-and-copy.sh` to support copying to multiple hosts.
|
||||
- Added `-c, --copy-to` (accepts space- or comma-separated host lists) and kept `--copy-to-host` as a backward-compatible alias.
|
||||
- Short `-h` is now used for help.
|
||||
|
||||
### 2025-12-15
|
||||
|
||||
Updated `build-and-copy.sh` flags:
|
||||
@@ -28,7 +34,7 @@ Triton is now being built from the source, alongside with its companion triton_k
|
||||
|
||||
Added new flags to `build-and-copy.sh`:
|
||||
- `--triton-sha <sha>`: Specify Triton commit SHA (defaults to v3.5.1 currently)
|
||||
- `--no-build`: Skip building and only copy existing image (requires `--copy-to-host`)
|
||||
- `--no-build`: Skip building and only copy existing image (requires `--copy-to`)
|
||||
|
||||
### 2025-12-11 update
|
||||
|
||||
@@ -65,7 +71,7 @@ Using a provided build script is recommended, but if you want to build using `do
|
||||
|
||||
### Using the Build Script (Recommended)
|
||||
|
||||
The `build-and-copy.sh` script automates the build process and optionally copies the image to another node. This is the recommended method for building and deploying to multiple Spark nodes.
|
||||
The `build-and-copy.sh` script automates the build process and optionally copies the image to one or more nodes. This is the recommended method for building and deploying to multiple Spark nodes.
|
||||
|
||||
**Basic usage (build only):**
|
||||
|
||||
@@ -79,18 +85,24 @@ The `build-and-copy.sh` script automates the build process and optionally copies
|
||||
./build-and-copy.sh --tag my-vllm-node
|
||||
```
|
||||
|
||||
**Build and copy to another Spark node:**
|
||||
**Build and copy to Spark node(s):**
|
||||
|
||||
Using the same username as currently logged-in user:
|
||||
Using the same username as currently logged-in user (single host):
|
||||
|
||||
```bash
|
||||
./build-and-copy.sh --copy-to-host 192.168.177.12
|
||||
./build-and-copy.sh --copy-to 192.168.177.12
|
||||
```
|
||||
|
||||
Copy to multiple hosts (space- or comma-separated after the flag):
|
||||
|
||||
```bash
|
||||
./build-and-copy.sh --copy-to 192.168.177.12 192.168.177.13
|
||||
```
|
||||
|
||||
Using a different username:
|
||||
|
||||
```bash
|
||||
./build-and-copy.sh --copy-to-host 192.168.177.12 --user your_username
|
||||
./build-and-copy.sh --copy-to 192.168.177.12 --user your_username
|
||||
```
|
||||
|
||||
**Force rebuild vLLM source only:**
|
||||
@@ -108,7 +120,7 @@ Using a different username:
|
||||
**Combined example (rebuild vLLM and copy to another node):**
|
||||
|
||||
```bash
|
||||
./build-and-copy.sh --rebuild-vllm --copy-to-host 192.168.177.12
|
||||
./build-and-copy.sh --rebuild-vllm --copy-to 192.168.177.12
|
||||
```
|
||||
|
||||
**Build with specific Triton commit:**
|
||||
@@ -120,7 +132,7 @@ Using a different username:
|
||||
**Copy existing image without rebuilding:**
|
||||
|
||||
```bash
|
||||
./build-and-copy.sh --no-build --copy-to-host 192.168.177.12
|
||||
./build-and-copy.sh --no-build --copy-to 192.168.177.12
|
||||
```
|
||||
|
||||
**Available options:**
|
||||
@@ -132,10 +144,11 @@ Using a different username:
|
||||
| `--rebuild-vllm` | Force rebuild vLLM source only (sets CACHEBUST_VLLM) |
|
||||
| `--triton-ref <ref>` | Triton commit SHA, branch or tag (default: 'v3.5.1') |
|
||||
| `--vllm-ref <ref>` | vLLM commit SHA, branch or tag (default: 'main') |
|
||||
| `-h, --copy-to-host <host>` | Host address to copy the image to after building |
|
||||
| `-c, --copy-to <host[,host...] or host host...>` | Host(s) to copy the image to after building (space- or comma-separated list after the flag). |
|
||||
| `--copy-to-host` | Alias for `--copy-to` (backwards compatibility). |
|
||||
| `-u, --user <user>` | Username for SSH connection (default: current user) |
|
||||
| `--no-build` | Skip building, only copy existing image (requires `--copy-to-host`) |
|
||||
| `--help` | Show help message |
|
||||
| `--no-build` | Skip building, only copy existing image (requires `--copy-to`) |
|
||||
| `-h, --help` | Show help message |
|
||||
|
||||
**IMPORTANT**: When copying to another node, make sure you use the Spark IP assigned to its ConnectX 7 interface (enp1s0f1np1), and not the 10G interface (enP7s7)!
|
||||
|
||||
@@ -330,4 +343,3 @@ Modify `--num-prompts` to benchmark concurrent requests - the command above will
|
||||
### Hardware Architecture
|
||||
|
||||
**Note:** The Dockerfile defaults to `TORCH_CUDA_ARCH_LIST=12.1a` (NVIDIA GB10). If you are using different hardware, update the `ENV` variable in the Dockerfile before building.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user