Updated README

This commit is contained in:
eugr
2025-11-26 13:30:17 -08:00
parent 549214e6ed
commit 2a7d31ad81

View File

@@ -1,7 +1,15 @@
# vLLM Ray Cluster Node Docker for DGX Spark
This repository contains the Docker configuration and startup scripts to run a multi-node vLLM inference cluster using Ray. It supports InfiniBand/RDMA (NCCL) and custom environment configuration for high-performance setups.
## DISCLAIMER
This repository is not affiliated with NVIDIA or their subsidiaries. The content is provided as a reference material only, not intended for production use.
Some of the steps and parameters may be unnecessary, and some may be missing. This is a work in progress. Use at your own risk!
The Dockerfile builds from the main branch of VLLM, so depending on when you run the build process, it may not be in fully funcioning state.
## 1\. Building the Docker Image
The Dockerfile includes specific **Build Arguments** to allow you to selectively rebuild layers (e.g., update the vLLM source code without re-downloading PyTorch).
@@ -93,9 +101,9 @@ docker run --privileged --gpus all -it --rm \
**Flags Explained:**
* `--net=host`: **Required.** Ray needs full control over network ports; port mapping is insufficient for multi-node clusters.
* `--ipc=host`: **Required.** Allows shared memory access for PyTorch/NCCL.
* `--privileged`: **Required for InfiniBand.** Grants the container access to RDMA devices (`/dev/infiniband`).
* `--net=host`: **Required.** Ray and NCCL need full access to host network interfaces.
* `--ipc=host`: **Recommended.** Allows shared memory access for PyTorch/NCCL. As an alternative, you can set it via `--shm-size=16g`.
* `--privileged`: **Recommended for InfiniBand.** Grants the container access to RDMA devices (`/dev/infiniband`). As an alternative, you can pass `--ulimit memlock=-1 --ulimit stack=67108864 --device=/dev/infiniband`.
-----
@@ -119,6 +127,21 @@ Normally you would start it with the container like in the example above, but yo
| `-i` | `--ib-if` | InfiniBand interface name (e.g., `ib0`, `rocep1s0f1`). | **Yes** |
| `-m` | `--head-ip` | The IP address of the **Head Node**. | Only if role is `node` |
**Hint**: to decide which interfaces to use, you can run `ibdev2netdev`. You will see an output like this:
```
rocep1s0f0 port 1 ==> enp1s0f0np0 (Down)
rocep1s0f1 port 1 ==> enp1s0f1np1 (Up)
roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Down)
roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Up)
```
Each physical port on Spark has two pairs of logical interfaces in Linux.
Current NVIDIA guidance recommends using only one of them, in this case it would be `enp1s0f1np1` for Ethernet and `rocep1s0f1` for IB.
You need to make sure you allocate IP addresses to them (no need to allocate IP to their "twins").
### Example: Starting inside the Head Node
```bash
@@ -174,8 +197,5 @@ And execute vllm command inside.
### Hardware Architecture
**Note:** The Dockerfile defaults to `TORCH_CUDA_ARCH_LIST=12.1a` (NVIDIA GB10). If you are using different hardware, update the `ENV` variable in the Dockerfile before building:
**Note:** The Dockerfile defaults to `TORCH_CUDA_ARCH_LIST=12.1a` (NVIDIA GB10). If you are using different hardware, update the `ENV` variable in the Dockerfile before building.
* **H100:** `9.0`
* **A100:** `8.0`
* **L40S:** `8.9`