Compare commits
2 Commits
prebuilt-f
...
f24d177802
| Author | SHA1 | Date | |
|---|---|---|---|
| f24d177802 | |||
| bb0d120177 |
53
.gitea/workflows/build.yml
Normal file
53
.gitea/workflows/build.yml
Normal file
@@ -0,0 +1,53 @@
|
||||
name: Build and Push spark-vllm
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- main
|
||||
workflow_dispatch:
|
||||
|
||||
env:
|
||||
IMAGE_NAME: spark-vllm
|
||||
IMAGE_TAG: latest
|
||||
|
||||
jobs:
|
||||
docker:
|
||||
runs-on: nix
|
||||
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Login to Registry
|
||||
run: |
|
||||
echo "${{ secrets.REGISTRY_PASSWORD }}" | docker login \
|
||||
${{ secrets.REGISTRY_HOST }} \
|
||||
-u "${{ secrets.REGISTRY_USERNAME }}" \
|
||||
--password-stdin
|
||||
|
||||
- name: Make build script executable
|
||||
run: chmod +x build-and-copy.sh
|
||||
|
||||
- name: Build image using upstream script
|
||||
run: |
|
||||
./build-and-copy.sh -t ${IMAGE_NAME}:${IMAGE_TAG}
|
||||
|
||||
- name: Tag image
|
||||
run: |
|
||||
docker tag \
|
||||
${IMAGE_NAME}:${IMAGE_TAG} \
|
||||
${{ secrets.REGISTRY_HOST }}/${IMAGE_NAME}:${IMAGE_TAG}
|
||||
|
||||
docker tag \
|
||||
${IMAGE_NAME}:${IMAGE_TAG} \
|
||||
${{ secrets.REGISTRY_HOST }}/${IMAGE_NAME}:${GITEA_SHA::7}
|
||||
|
||||
- name: Push latest
|
||||
run: |
|
||||
docker push \
|
||||
${{ secrets.REGISTRY_HOST }}/${IMAGE_NAME}:${IMAGE_TAG}
|
||||
|
||||
- name: Push commit SHA
|
||||
run: |
|
||||
docker push \
|
||||
${{ secrets.REGISTRY_HOST }}/${IMAGE_NAME}:${GITEA_SHA::7}
|
||||
@@ -1,4 +1,3 @@
|
||||
|
||||
# vLLM Docker Optimized for DGX Spark (single or multi-node)
|
||||
|
||||
This repository contains the Docker configuration and startup scripts to run a multi-node vLLM inference cluster using Ray. It supports InfiniBand/RDMA (NCCL) and custom environment configuration for high-performance setups.
|
||||
@@ -1104,6 +1103,7 @@ The script attempts to automatically detect:
|
||||
* **Cluster peers:** Discovered by scanning the `ETH_IF` subnet for hosts with SSH access **and** a GB10 GPU (`nvidia-smi --query-gpu=name` must return `NVIDIA GB10`).
|
||||
* **Copy hosts (`COPY_HOSTS`):** In standard mode, same as cluster peers. In mesh mode, scanned separately on `enp1s0f0np0` and `enp1s0f1np1` subnets so that image/model transfers use the direct InfiniBand path.
|
||||
|
||||
|
||||
### Manual Overrides
|
||||
|
||||
You can override the auto-detected values if needed:
|
||||
|
||||
Reference in New Issue
Block a user