Compare commits
12 Commits
prebuilt-f
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| a5b1c7006e | |||
| ee6129d54e | |||
| f30289ec57 | |||
| 97e6afcf3b | |||
| eae788259a | |||
| 896cdefedf | |||
| d3dbfb682a | |||
| 0bb0da779e | |||
| f307d8dc76 | |||
| 1d0fe50d46 | |||
| f24d177802 | |||
| bb0d120177 |
53
.gitea/workflows/build.yml
Normal file
53
.gitea/workflows/build.yml
Normal file
@@ -0,0 +1,53 @@
|
|||||||
|
name: Build and Push spark-vllm
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches:
|
||||||
|
- main
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
env:
|
||||||
|
IMAGE_NAME: spark-vllm-docker
|
||||||
|
IMAGE_TAG: latest
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
docker:
|
||||||
|
runs-on: dgx
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Log in to Gitea Container Registry
|
||||||
|
uses: docker/login-action@v3
|
||||||
|
with:
|
||||||
|
registry: ${{ github.server_url }}
|
||||||
|
username: ${{ secrets.CI_USER }}
|
||||||
|
password: ${{ secrets.CI_PASSWORD }}
|
||||||
|
|
||||||
|
|
||||||
|
- name: Build image using upstream script
|
||||||
|
env:
|
||||||
|
DOCKER_DEFAULT_PLATFORM: linux/arm64
|
||||||
|
run: |
|
||||||
|
bash build-and-copy.sh -t ${IMAGE_NAME}:${IMAGE_TAG}
|
||||||
|
|
||||||
|
- name: Tag and Push Docker Image
|
||||||
|
shell: bash
|
||||||
|
run: |
|
||||||
|
VERSION=${{ github.run_number }}
|
||||||
|
|
||||||
|
REGISTRY=${GITHUB_SERVER_URL#https://}
|
||||||
|
|
||||||
|
TARGET_IMAGE=$REGISTRY/${{ github.repository_owner }}/${IMAGE_NAME}
|
||||||
|
|
||||||
|
docker tag \
|
||||||
|
${IMAGE_NAME}:${IMAGE_TAG} \
|
||||||
|
$TARGET_IMAGE:$VERSION
|
||||||
|
|
||||||
|
docker tag \
|
||||||
|
${IMAGE_NAME}:${IMAGE_TAG} \
|
||||||
|
$TARGET_IMAGE:latest
|
||||||
|
|
||||||
|
docker push $TARGET_IMAGE:$VERSION
|
||||||
|
docker push $TARGET_IMAGE:latest
|
||||||
@@ -1,4 +1,3 @@
|
|||||||
|
|
||||||
# vLLM Docker Optimized for DGX Spark (single or multi-node)
|
# vLLM Docker Optimized for DGX Spark (single or multi-node)
|
||||||
|
|
||||||
This repository contains the Docker configuration and startup scripts to run a multi-node vLLM inference cluster using Ray. It supports InfiniBand/RDMA (NCCL) and custom environment configuration for high-performance setups.
|
This repository contains the Docker configuration and startup scripts to run a multi-node vLLM inference cluster using Ray. It supports InfiniBand/RDMA (NCCL) and custom environment configuration for high-performance setups.
|
||||||
@@ -1104,6 +1103,7 @@ The script attempts to automatically detect:
|
|||||||
* **Cluster peers:** Discovered by scanning the `ETH_IF` subnet for hosts with SSH access **and** a GB10 GPU (`nvidia-smi --query-gpu=name` must return `NVIDIA GB10`).
|
* **Cluster peers:** Discovered by scanning the `ETH_IF` subnet for hosts with SSH access **and** a GB10 GPU (`nvidia-smi --query-gpu=name` must return `NVIDIA GB10`).
|
||||||
* **Copy hosts (`COPY_HOSTS`):** In standard mode, same as cluster peers. In mesh mode, scanned separately on `enp1s0f0np0` and `enp1s0f1np1` subnets so that image/model transfers use the direct InfiniBand path.
|
* **Copy hosts (`COPY_HOSTS`):** In standard mode, same as cluster peers. In mesh mode, scanned separately on `enp1s0f0np0` and `enp1s0f1np1` subnets so that image/model transfers use the direct InfiniBand path.
|
||||||
|
|
||||||
|
|
||||||
### Manual Overrides
|
### Manual Overrides
|
||||||
|
|
||||||
You can override the auto-detected values if needed:
|
You can override the auto-detected values if needed:
|
||||||
|
|||||||
@@ -471,6 +471,7 @@ fi
|
|||||||
COMMON_BUILD_FLAGS+=("--build-arg" "BUILD_JOBS=$BUILD_JOBS")
|
COMMON_BUILD_FLAGS+=("--build-arg" "BUILD_JOBS=$BUILD_JOBS")
|
||||||
COMMON_BUILD_FLAGS+=("--build-arg" "TORCH_CUDA_ARCH_LIST=$GPU_ARCH_LIST")
|
COMMON_BUILD_FLAGS+=("--build-arg" "TORCH_CUDA_ARCH_LIST=$GPU_ARCH_LIST")
|
||||||
COMMON_BUILD_FLAGS+=("--build-arg" "FLASHINFER_CUDA_ARCH_LIST=$GPU_ARCH_LIST")
|
COMMON_BUILD_FLAGS+=("--build-arg" "FLASHINFER_CUDA_ARCH_LIST=$GPU_ARCH_LIST")
|
||||||
|
COMMON_BUILD_FLAGS+=("--platform" "linux/arm64")
|
||||||
if [ -n "$NETWORK_ARG" ]; then
|
if [ -n "$NETWORK_ARG" ]; then
|
||||||
COMMON_BUILD_FLAGS+=("--network" "$NETWORK_ARG")
|
COMMON_BUILD_FLAGS+=("--network" "$NETWORK_ARG")
|
||||||
fi
|
fi
|
||||||
|
|||||||
Reference in New Issue
Block a user