Merge branch 'main' of gitea.corredorconect.com:software-engineering/spark-vllm-docker
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 4s

This commit is contained in:
2026-05-11 13:21:53 -05:00

View File

@@ -1,4 +1,3 @@
# vLLM Docker Optimized for DGX Spark (single or multi-node)
This repository contains the Docker configuration and startup scripts to run a multi-node vLLM inference cluster using Ray. It supports InfiniBand/RDMA (NCCL) and custom environment configuration for high-performance setups.
@@ -1104,6 +1103,7 @@ The script attempts to automatically detect:
* **Cluster peers:** Discovered by scanning the `ETH_IF` subnet for hosts with SSH access **and** a GB10 GPU (`nvidia-smi --query-gpu=name` must return `NVIDIA GB10`).
* **Copy hosts (`COPY_HOSTS`):** In standard mode, same as cluster peers. In mesh mode, scanned separately on `enp1s0f0np0` and `enp1s0f1np1` subnets so that image/model transfers use the direct InfiniBand path.
### Manual Overrides
You can override the auto-detected values if needed: