Merge branch 'main' of gitea.corredorconect.com:software-engineering/spark-vllm-docker
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 4s
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 4s
This commit is contained in:
@@ -1,4 +1,3 @@
|
|||||||
|
|
||||||
# vLLM Docker Optimized for DGX Spark (single or multi-node)
|
# vLLM Docker Optimized for DGX Spark (single or multi-node)
|
||||||
|
|
||||||
This repository contains the Docker configuration and startup scripts to run a multi-node vLLM inference cluster using Ray. It supports InfiniBand/RDMA (NCCL) and custom environment configuration for high-performance setups.
|
This repository contains the Docker configuration and startup scripts to run a multi-node vLLM inference cluster using Ray. It supports InfiniBand/RDMA (NCCL) and custom environment configuration for high-performance setups.
|
||||||
@@ -1104,6 +1103,7 @@ The script attempts to automatically detect:
|
|||||||
* **Cluster peers:** Discovered by scanning the `ETH_IF` subnet for hosts with SSH access **and** a GB10 GPU (`nvidia-smi --query-gpu=name` must return `NVIDIA GB10`).
|
* **Cluster peers:** Discovered by scanning the `ETH_IF` subnet for hosts with SSH access **and** a GB10 GPU (`nvidia-smi --query-gpu=name` must return `NVIDIA GB10`).
|
||||||
* **Copy hosts (`COPY_HOSTS`):** In standard mode, same as cluster peers. In mesh mode, scanned separately on `enp1s0f0np0` and `enp1s0f1np1` subnets so that image/model transfers use the direct InfiniBand path.
|
* **Copy hosts (`COPY_HOSTS`):** In standard mode, same as cluster peers. In mesh mode, scanned separately on `enp1s0f0np0` and `enp1s0f1np1` subnets so that image/model transfers use the direct InfiniBand path.
|
||||||
|
|
||||||
|
|
||||||
### Manual Overrides
|
### Manual Overrides
|
||||||
|
|
||||||
You can override the auto-detected values if needed:
|
You can override the auto-detected values if needed:
|
||||||
|
|||||||
Reference in New Issue
Block a user