This commit is contained in:
@@ -1,4 +1,3 @@
|
||||
|
||||
# vLLM Docker Optimized for DGX Spark (single or multi-node)
|
||||
|
||||
This repository contains the Docker configuration and startup scripts to run a multi-node vLLM inference cluster using Ray. It supports InfiniBand/RDMA (NCCL) and custom environment configuration for high-performance setups.
|
||||
@@ -1104,6 +1103,7 @@ The script attempts to automatically detect:
|
||||
* **Cluster peers:** Discovered by scanning the `ETH_IF` subnet for hosts with SSH access **and** a GB10 GPU (`nvidia-smi --query-gpu=name` must return `NVIDIA GB10`).
|
||||
* **Copy hosts (`COPY_HOSTS`):** In standard mode, same as cluster peers. In mesh mode, scanned separately on `enp1s0f0np0` and `enp1s0f1np1` subnets so that image/model transfers use the direct InfiniBand path.
|
||||
|
||||
|
||||
### Manual Overrides
|
||||
|
||||
You can override the auto-detected values if needed:
|
||||
|
||||
Reference in New Issue
Block a user