12 Commits

Author SHA1 Message Date
a5b1c7006e fix image name
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 6m2s
2026-05-11 15:04:37 -05:00
ee6129d54e push run num
Some checks failed
Build and Push spark-vllm / docker (push) Has been cancelled
2026-05-11 15:03:02 -05:00
f30289ec57 fix tag and push
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 30s
2026-05-11 15:01:19 -05:00
97e6afcf3b fix label
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 7m12s
2026-05-11 14:50:08 -05:00
eae788259a run job on arm64
Some checks failed
Build and Push spark-vllm / docker (push) Has been cancelled
2026-05-11 14:48:16 -05:00
896cdefedf build on arm64
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 1m30s
2026-05-11 14:40:20 -05:00
d3dbfb682a set docker platform to arm64
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 4m39s
2026-05-11 14:34:17 -05:00
0bb0da779e run using bash
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 15m0s
2026-05-11 13:40:42 -05:00
f307d8dc76 Merge branch 'main' of gitea.corredorconect.com:software-engineering/spark-vllm-docker
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 4s
2026-05-11 13:21:53 -05:00
1d0fe50d46 login using action 2026-05-11 13:21:19 -05:00
f24d177802 Update README.md
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 4s
2026-05-11 18:20:33 +00:00
bb0d120177 gitea workflow 2026-05-11 13:16:59 -05:00
3 changed files with 55 additions and 1 deletions

View File

@@ -0,0 +1,53 @@
name: Build and Push spark-vllm
on:
push:
branches:
- main
workflow_dispatch:
env:
IMAGE_NAME: spark-vllm-docker
IMAGE_TAG: latest
jobs:
docker:
runs-on: dgx
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Log in to Gitea Container Registry
uses: docker/login-action@v3
with:
registry: ${{ github.server_url }}
username: ${{ secrets.CI_USER }}
password: ${{ secrets.CI_PASSWORD }}
- name: Build image using upstream script
env:
DOCKER_DEFAULT_PLATFORM: linux/arm64
run: |
bash build-and-copy.sh -t ${IMAGE_NAME}:${IMAGE_TAG}
- name: Tag and Push Docker Image
shell: bash
run: |
VERSION=${{ github.run_number }}
REGISTRY=${GITHUB_SERVER_URL#https://}
TARGET_IMAGE=$REGISTRY/${{ github.repository_owner }}/${IMAGE_NAME}
docker tag \
${IMAGE_NAME}:${IMAGE_TAG} \
$TARGET_IMAGE:$VERSION
docker tag \
${IMAGE_NAME}:${IMAGE_TAG} \
$TARGET_IMAGE:latest
docker push $TARGET_IMAGE:$VERSION
docker push $TARGET_IMAGE:latest

View File

@@ -1,4 +1,3 @@
# vLLM Docker Optimized for DGX Spark (single or multi-node) # vLLM Docker Optimized for DGX Spark (single or multi-node)
This repository contains the Docker configuration and startup scripts to run a multi-node vLLM inference cluster using Ray. It supports InfiniBand/RDMA (NCCL) and custom environment configuration for high-performance setups. This repository contains the Docker configuration and startup scripts to run a multi-node vLLM inference cluster using Ray. It supports InfiniBand/RDMA (NCCL) and custom environment configuration for high-performance setups.
@@ -1104,6 +1103,7 @@ The script attempts to automatically detect:
* **Cluster peers:** Discovered by scanning the `ETH_IF` subnet for hosts with SSH access **and** a GB10 GPU (`nvidia-smi --query-gpu=name` must return `NVIDIA GB10`). * **Cluster peers:** Discovered by scanning the `ETH_IF` subnet for hosts with SSH access **and** a GB10 GPU (`nvidia-smi --query-gpu=name` must return `NVIDIA GB10`).
* **Copy hosts (`COPY_HOSTS`):** In standard mode, same as cluster peers. In mesh mode, scanned separately on `enp1s0f0np0` and `enp1s0f1np1` subnets so that image/model transfers use the direct InfiniBand path. * **Copy hosts (`COPY_HOSTS`):** In standard mode, same as cluster peers. In mesh mode, scanned separately on `enp1s0f0np0` and `enp1s0f1np1` subnets so that image/model transfers use the direct InfiniBand path.
### Manual Overrides ### Manual Overrides
You can override the auto-detected values if needed: You can override the auto-detected values if needed:

View File

@@ -471,6 +471,7 @@ fi
COMMON_BUILD_FLAGS+=("--build-arg" "BUILD_JOBS=$BUILD_JOBS") COMMON_BUILD_FLAGS+=("--build-arg" "BUILD_JOBS=$BUILD_JOBS")
COMMON_BUILD_FLAGS+=("--build-arg" "TORCH_CUDA_ARCH_LIST=$GPU_ARCH_LIST") COMMON_BUILD_FLAGS+=("--build-arg" "TORCH_CUDA_ARCH_LIST=$GPU_ARCH_LIST")
COMMON_BUILD_FLAGS+=("--build-arg" "FLASHINFER_CUDA_ARCH_LIST=$GPU_ARCH_LIST") COMMON_BUILD_FLAGS+=("--build-arg" "FLASHINFER_CUDA_ARCH_LIST=$GPU_ARCH_LIST")
COMMON_BUILD_FLAGS+=("--platform" "linux/arm64")
if [ -n "$NETWORK_ARG" ]; then if [ -n "$NETWORK_ARG" ]; then
COMMON_BUILD_FLAGS+=("--network" "$NETWORK_ARG") COMMON_BUILD_FLAGS+=("--network" "$NETWORK_ARG")
fi fi