spark-vllm-docker

Author	SHA1	Message	Date
Eugene Rakhmatulin	07fac71dac	Fixed bug with CONTAINER_NAME variable	2026-03-25 14:42:01 -07:00
Eugene Rakhmatulin	ad2cd3373f	.env configuration support for launch-cluster.sh	2026-03-25 14:18:00 -07:00
Eugene Rakhmatulin	c4b078b868	Merge branch 'main' into 3-node	2026-03-24 22:21:25 -07:00
Eugene Rakhmatulin	3be2fb24a8	Merge pull request #122	2026-03-24 22:18:52 -07:00
Eugene Rakhmatulin	7fa69187df	metadata changes	2026-03-24 22:18:07 -07:00
Drew Botwinick	8298c3d7f8	Merge remote-tracking branch 'upstream/main' # Conflicts: # Dockerfile	2026-03-24 15:41:09 -05:00
Eugene Rakhmatulin	f8c2653fd3	Quick fix for NCCL dependency	2026-03-23 23:20:59 -07:00
Eugene Rakhmatulin	990a7b3837	Use mesh-optimized NCCL	2026-03-23 15:43:18 -07:00
Eugene Rakhmatulin	9e089acf2b	Updated Nemotron recipes to use VLLM CUTLASS	2026-03-22 23:03:24 -07:00
Eugene Rakhmatulin	2d749742e4	Changed base image back to base CUDA development one	2026-03-21 18:11:20 -07:00
Eugene Rakhmatulin	7a54657abf	Revert "cuda 13.2 torch" This reverts commit `926dd57a87`.	2026-03-21 15:36:17 -07:00
Eugene Rakhmatulin	926dd57a87	cuda 13.2 torch	2026-03-21 15:15:01 -07:00
Eugene Rakhmatulin	6e8d85c914	cleanup	2026-03-21 15:12:12 -07:00
Drew Botwinick	d6e76f8e2f	add build metadata generation and include in Dockerfiles	2026-03-21 16:10:04 -05:00
Eugene Rakhmatulin	8385506c5e	Fixes	2026-03-20 23:51:21 -07:00
Eugene Rakhmatulin	8caebe3155	Reverting back to CUDA image + pytorch from wheels	2026-03-20 17:03:18 -07:00
Eugene Rakhmatulin	919a881cb1	Merge branch 'main' of gitlab.home.eugr.net:ai/spark-vllm	2026-03-18 22:03:25 -07:00
Eugene Rakhmatulin	8ddc259619	Fixed #111	2026-03-18 22:03:04 -07:00
eugr	22f3fa6c21	Merge pull request #103 from apairmont/network_arg Add docker --network arg to common build flags	2026-03-18 21:48:48 -07:00
Eugene Rakhmatulin	15d295887c	Updated README to reflect `--master-port` parameter	2026-03-18 21:23:28 -07:00
Eugene Rakhmatulin	7e4150feed	Added master-port argument	2026-03-18 16:57:55 -07:00
eugr	7b752c31c5	Merge pull request #110 from voloszad/patch-1 Remove run-cluster-node.sh script copy and permission commands from Dockerfile.mxfp4	2026-03-18 14:54:11 -07:00
Andrej V.	bdd2b10f54	Remove script copy and permission commands from Dockerfile Removed script copying and permission setting for run-cluster-node.sh.	2026-03-18 21:57:56 +01:00
Eugene Rakhmatulin	2755b62d12	Fixes #108	2026-03-18 13:26:39 -07:00
Eugene Rakhmatulin	f327b92abe	Fixes #106 and #108	2026-03-18 13:06:44 -07:00
Eugene Rakhmatulin	57b458570e	Added experimental Qwen3.5-397B support for dual Spark configuration	2026-03-17 19:05:36 -07:00
Eugene Rakhmatulin	57ed099465	Updated README file to reflect new launch-cluster options.	2026-03-17 16:16:04 -07:00
Eugene Rakhmatulin	fb0687cd1b	Updated README to describe no-ray mode	2026-03-17 15:27:22 -07:00
Eugene Rakhmatulin	ccea2ba861	Bugfixes	2026-03-17 13:54:42 -07:00
Eugene Rakhmatulin	957605498c	Added extra passthrough variables to run-recipe	2026-03-17 13:41:40 -07:00
Eugene Rakhmatulin	b1eeefc0eb	Changed Nemotron-3-Nano-NVFP4 to Marlin backend	2026-03-17 13:10:48 -07:00
Alan Pairmont	b879b7748f	add network arg to common build flags	2026-03-16 12:09:59 -04:00
Eugene Rakhmatulin	fa645f3e4b	bugfixes	2026-03-13 13:39:30 -07:00
Eugene Rakhmatulin	dedbd0a01d	bugfixes	2026-03-13 12:41:48 -07:00
Eugene Rakhmatulin	caa83d9e5b	Bugfixes	2026-03-13 12:32:43 -07:00
Eugene Rakhmatulin	4bcbbaa25a	Bugfixes	2026-03-13 12:23:41 -07:00
Eugene Rakhmatulin	d08266a123	Bugfixes	2026-03-13 12:18:22 -07:00
Eugene Rakhmatulin	03b055d7f0	Major cluster orchestration refactoring to support running without Ray	2026-03-13 11:55:18 -07:00
Eugene Rakhmatulin	d609fecef3	Merge branch 'main' of github.com:eugr/spark-vllm-docker	2026-03-12 15:04:41 -07:00
eugr	7c198b1ceb	Merge pull request #90 from sonusflow/pr/qwen35-397b-tp4 Add Qwen3.5-397B INT4-AutoRound TP=4 recipe (37 tok/s)	2026-03-12 15:04:23 -07:00
Eugene Rakhmatulin	8ae51192e5	Experimental mod to support gpu-memory-utilization-gb	2026-03-12 13:37:44 -07:00
Eugene Rakhmatulin	8fec9bed06	Updated Nemotron to support dual sparks	2026-03-12 13:30:15 -07:00
Eugene Rakhmatulin	6a323cc6f5	Merge pull request #93	2026-03-12 13:00:13 -07:00
Eugene Rakhmatulin	6f9a2f981c	Adjusted model parameters	2026-03-12 12:59:05 -07:00
remi	122edc8229	super nemotron mod & recipe for nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4	2026-03-11 20:53:44 +01:00
Eugene Rakhmatulin	7ceea85647	Fixed qwen3-coder-next-int4-autoround to exclude Ray	2026-03-11 11:20:56 -07:00
Eugene Rakhmatulin	45066e2b16	Updated README	2026-03-11 09:57:34 -07:00
Eugene Rakhmatulin	f2cf11b047	Added a recipe for qwen3-coder-next-int4-autoround	2026-03-11 09:23:23 -07:00
sonusflow	3baca14eb1	Move recipe to 4x-spark-cluster/ and add UMA memory optimizations - Move qwen3.5-397b-int4-autoround.yaml to recipes/4x-spark-cluster/ per maintainer request (multi-node recipes in separate directory) - Add PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to recipe env - Optimize Ray for GB10 UMA (128GB shared CPU/GPU memory): - Disable Ray dashboard (saves ~1.2 GiB per node) - Limit Ray object store to 1 GiB (default 30% of RAM = 33 GiB) - Disable pre-started idle workers (saves ~8 GiB on head node) - Set --num-cpus 2 and --disable-usage-stats on all nodes - Net effect: ~40+ GiB freed across 4-node cluster for model/KV cache	2026-03-11 07:29:45 +00:00
Eugene Rakhmatulin	66b5c85907	Merge branch 'main' of github.com:eugr/spark-vllm-docker	2026-03-10 10:29:10 -07:00

1 2 3 4 5 ...

297 Commits