spark-vllm-docker

Author	SHA1	Message	Date
Eugene Rakhmatulin	2d749742e4	Changed base image back to base CUDA development one	2026-03-21 18:11:20 -07:00
Eugene Rakhmatulin	7a54657abf	Revert "cuda 13.2 torch" This reverts commit `926dd57a87`.	2026-03-21 15:36:17 -07:00
Eugene Rakhmatulin	926dd57a87	cuda 13.2 torch	2026-03-21 15:15:01 -07:00
Eugene Rakhmatulin	6e8d85c914	cleanup	2026-03-21 15:12:12 -07:00
Eugene Rakhmatulin	8385506c5e	Fixes	2026-03-20 23:51:21 -07:00
Eugene Rakhmatulin	8caebe3155	Reverting back to CUDA image + pytorch from wheels	2026-03-20 17:03:18 -07:00
Eugene Rakhmatulin	919a881cb1	Merge branch 'main' of gitlab.home.eugr.net:ai/spark-vllm	2026-03-18 22:03:25 -07:00
Eugene Rakhmatulin	8ddc259619	Fixed #111	2026-03-18 22:03:04 -07:00
eugr	22f3fa6c21	Merge pull request #103 from apairmont/network_arg Add docker --network arg to common build flags	2026-03-18 21:48:48 -07:00
Eugene Rakhmatulin	15d295887c	Updated README to reflect `--master-port` parameter	2026-03-18 21:23:28 -07:00
Eugene Rakhmatulin	7e4150feed	Added master-port argument	2026-03-18 16:57:55 -07:00
eugr	7b752c31c5	Merge pull request #110 from voloszad/patch-1 Remove run-cluster-node.sh script copy and permission commands from Dockerfile.mxfp4	2026-03-18 14:54:11 -07:00
Andrej V.	bdd2b10f54	Remove script copy and permission commands from Dockerfile Removed script copying and permission setting for run-cluster-node.sh.	2026-03-18 21:57:56 +01:00
Eugene Rakhmatulin	2755b62d12	Fixes #108	2026-03-18 13:26:39 -07:00
Eugene Rakhmatulin	f327b92abe	Fixes #106 and #108	2026-03-18 13:06:44 -07:00
Eugene Rakhmatulin	57b458570e	Added experimental Qwen3.5-397B support for dual Spark configuration	2026-03-17 19:05:36 -07:00
Eugene Rakhmatulin	57ed099465	Updated README file to reflect new launch-cluster options.	2026-03-17 16:16:04 -07:00
Eugene Rakhmatulin	fb0687cd1b	Updated README to describe no-ray mode	2026-03-17 15:27:22 -07:00
Eugene Rakhmatulin	ccea2ba861	Bugfixes	2026-03-17 13:54:42 -07:00
Eugene Rakhmatulin	957605498c	Added extra passthrough variables to run-recipe	2026-03-17 13:41:40 -07:00
Eugene Rakhmatulin	b1eeefc0eb	Changed Nemotron-3-Nano-NVFP4 to Marlin backend	2026-03-17 13:10:48 -07:00
Alan Pairmont	b879b7748f	add network arg to common build flags	2026-03-16 12:09:59 -04:00
Eugene Rakhmatulin	fa645f3e4b	bugfixes	2026-03-13 13:39:30 -07:00
Eugene Rakhmatulin	dedbd0a01d	bugfixes	2026-03-13 12:41:48 -07:00
Eugene Rakhmatulin	caa83d9e5b	Bugfixes	2026-03-13 12:32:43 -07:00
Eugene Rakhmatulin	4bcbbaa25a	Bugfixes	2026-03-13 12:23:41 -07:00
Eugene Rakhmatulin	d08266a123	Bugfixes	2026-03-13 12:18:22 -07:00
Eugene Rakhmatulin	03b055d7f0	Major cluster orchestration refactoring to support running without Ray	2026-03-13 11:55:18 -07:00
Eugene Rakhmatulin	d609fecef3	Merge branch 'main' of github.com:eugr/spark-vllm-docker	2026-03-12 15:04:41 -07:00
eugr	7c198b1ceb	Merge pull request #90 from sonusflow/pr/qwen35-397b-tp4 Add Qwen3.5-397B INT4-AutoRound TP=4 recipe (37 tok/s)	2026-03-12 15:04:23 -07:00
Eugene Rakhmatulin	8ae51192e5	Experimental mod to support gpu-memory-utilization-gb	2026-03-12 13:37:44 -07:00
Eugene Rakhmatulin	8fec9bed06	Updated Nemotron to support dual sparks	2026-03-12 13:30:15 -07:00
Eugene Rakhmatulin	6a323cc6f5	Merge pull request #93	2026-03-12 13:00:13 -07:00
Eugene Rakhmatulin	6f9a2f981c	Adjusted model parameters	2026-03-12 12:59:05 -07:00
remi	122edc8229	super nemotron mod & recipe for nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4	2026-03-11 20:53:44 +01:00
Eugene Rakhmatulin	7ceea85647	Fixed qwen3-coder-next-int4-autoround to exclude Ray	2026-03-11 11:20:56 -07:00
Eugene Rakhmatulin	45066e2b16	Updated README	2026-03-11 09:57:34 -07:00
Eugene Rakhmatulin	f2cf11b047	Added a recipe for qwen3-coder-next-int4-autoround	2026-03-11 09:23:23 -07:00
sonusflow	3baca14eb1	Move recipe to 4x-spark-cluster/ and add UMA memory optimizations - Move qwen3.5-397b-int4-autoround.yaml to recipes/4x-spark-cluster/ per maintainer request (multi-node recipes in separate directory) - Add PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to recipe env - Optimize Ray for GB10 UMA (128GB shared CPU/GPU memory): - Disable Ray dashboard (saves ~1.2 GiB per node) - Limit Ray object store to 1 GiB (default 30% of RAM = 33 GiB) - Disable pre-started idle workers (saves ~8 GiB on head node) - Set --num-cpus 2 and --disable-usage-stats on all nodes - Net effect: ~40+ GiB freed across 4-node cluster for model/KV cache	2026-03-11 07:29:45 +00:00
Eugene Rakhmatulin	66b5c85907	Merge branch 'main' of github.com:eugr/spark-vllm-docker	2026-03-10 10:29:10 -07:00
eugr	0019bdf5ed	Merge pull request #85 from saladinomario/feat/recipe-env-passthrough Add -e/--env passthrough to run-recipe.py	2026-03-10 10:28:29 -07:00
sonusflow	006734910c	Add Qwen3.5-397B INT4-AutoRound TP=4 recipe and Marlin fix Production-tested recipe for running Qwen3.5-397B-A17B with INT4 AutoRound quantization across 4 DGX Spark nodes using tensor parallelism. Performance (4× DGX Spark, driver 580.126.09): - Single user: 37 tok/s - 4 concurrent: ~26 tok/s per user, ~103 tok/s aggregate The Marlin TP fix resolves the MIN_THREAD_N=64 constraint that breaks in_proj_ba layers at TP=4 (output_size=128/4=32 < 64). Solution: ReplicatedLinear for B/A projections, applied via diff patches. Key config: - VLLM_MARLIN_USE_ATOMIC_ADD=1 (required for Marlin correctness) - KV cache FP8, prefix caching enabled - gpu_memory_utilization 0.78 (UMA safe margin) - CUDAGraphs enabled (default, requires driver 580.x) Note: Driver 590.x has CUDAGraph capture deadlock on GB10 unified memory. Stay on driver 580.126.09.	2026-03-09 21:30:28 +00:00
Eugene Rakhmatulin	e225c709fb	Revert "fix: add temporary patch for CUDA graphs estimation" as it has been merged to main This reverts commit `63b2a8dbed`.	2026-03-09 09:46:50 -07:00
Eugene Rakhmatulin	63b2a8dbed	fix: add temporary patch for CUDA graphs estimation	2026-03-08 22:43:41 -07:00
eugr	9724619dbd	Merge pull request #87 from SeraphimSerapis/fix_wheels_download fix: skip empty lines in wheel download read loop	2026-03-07 09:34:31 -08:00
Eugene Rakhmatulin	d42c4199fa	Unsloth chat template for qwen3.5 staging-current-1772875976	2026-03-06 23:35:18 -08:00
Tim Messerschmidt	b9fc32ec34	fix: skip empty lines in wheel download read loop Add a guard to skip empty lines (e.g. trailing newlines) in the while-read loop to prevent try_download_wheels from breaking on unexpected blank input.	2026-03-07 05:06:12 +01:00
Eugene Rakhmatulin	9dc09bd04b	Renamed recipe for qwen3.5-35b-a3b-fp8 to match others	2026-03-06 13:56:06 -08:00
eugr	e88426646b	Merge pull request #76 from mmonad/fix-exec-arg-quoting Fix shell quoting for exec command arguments	2026-03-06 13:45:53 -08:00
mariosaladino	f95beba566	Add -e/--env passthrough to run-recipe.py Fixes #81. Allows passing environment variables (e.g. HF_TOKEN) through to the container when launching via recipes, mirroring the existing -e flag in launch-cluster.sh. Usage: ./run-recipe.sh glm-4.7-flash-awq --solo -e HF_TOKEN=$HF_TOKEN	2026-03-06 21:50:29 +01:00

1 2 3 4 5 ...

287 Commits