4 Commits

Author SHA1 Message Date
f307d8dc76 Merge branch 'main' of gitea.corredorconect.com:software-engineering/spark-vllm-docker
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 4s
2026-05-11 13:21:53 -05:00
1d0fe50d46 login using action 2026-05-11 13:21:19 -05:00
f24d177802 Update README.md
Some checks failed
Build and Push spark-vllm / docker (push) Failing after 4s
2026-05-11 18:20:33 +00:00
bb0d120177 gitea workflow 2026-05-11 13:16:59 -05:00
2 changed files with 54 additions and 1 deletions

View File

@@ -0,0 +1,53 @@
name: Build and Push spark-vllm
on:
push:
branches:
- main
workflow_dispatch:
env:
IMAGE_NAME: spark-vllm
IMAGE_TAG: latest
jobs:
docker:
runs-on: nix
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Log in to Gitea Container Registry
uses: docker/login-action@v3
with:
registry: ${{ github.server_url }}
username: ${{ secrets.CI_USER }}
password: ${{ secrets.CI_PASSWORD }}
- name: Make build script executable
run: chmod +x build-and-copy.sh
- name: Build image using upstream script
run: |
./build-and-copy.sh -t ${IMAGE_NAME}:${IMAGE_TAG}
- name: Tag image
run: |
docker tag \
${IMAGE_NAME}:${IMAGE_TAG} \
${{ secrets.REGISTRY_HOST }}/${IMAGE_NAME}:${IMAGE_TAG}
docker tag \
${IMAGE_NAME}:${IMAGE_TAG} \
${{ secrets.REGISTRY_HOST }}/${IMAGE_NAME}:${GITEA_SHA::7}
- name: Push latest
run: |
docker push \
${{ secrets.REGISTRY_HOST }}/${IMAGE_NAME}:${IMAGE_TAG}
- name: Push commit SHA
run: |
docker push \
${{ secrets.REGISTRY_HOST }}/${IMAGE_NAME}:${GITEA_SHA::7}

View File

@@ -1,4 +1,3 @@
# vLLM Docker Optimized for DGX Spark (single or multi-node)
This repository contains the Docker configuration and startup scripts to run a multi-node vLLM inference cluster using Ray. It supports InfiniBand/RDMA (NCCL) and custom environment configuration for high-performance setups.
@@ -1104,6 +1103,7 @@ The script attempts to automatically detect:
* **Cluster peers:** Discovered by scanning the `ETH_IF` subnet for hosts with SSH access **and** a GB10 GPU (`nvidia-smi --query-gpu=name` must return `NVIDIA GB10`).
* **Copy hosts (`COPY_HOSTS`):** In standard mode, same as cluster peers. In mesh mode, scanned separately on `enp1s0f0np0` and `enp1s0f1np1` subnets so that image/model transfers use the direct InfiniBand path.
### Manual Overrides
You can override the auto-detected values if needed: