spark-vllm-docker

Author	SHA1	Message	Date
Eugene Rakhmatulin	da4185cb12	Fixed an issue with fetching latest vLLM code	2026-02-11 22:35:49 -08:00
Eugene Rakhmatulin	3b1e49dcb0	Supporting other CUDA archs via `--gpu-arch` flag	2026-02-11 13:10:41 -08:00
Eugene Rakhmatulin	c6b245cfe8	Added prefix caching to nemotron recipe	2026-02-10 18:25:01 -08:00
Eugene Rakhmatulin	6d3f5dfd5c	map flashinfer/torch/triton cache directories by default	2026-02-10 16:36:02 -08:00
Eugene Rakhmatulin	b990a1b8ac	Fixed #37	2026-02-10 14:31:43 -08:00
Eugene Rakhmatulin	ace16f3a8f	Applied new fastsafetensors fix to mxfp4 build; disabled wheel builds by default	2026-02-09 23:47:06 -08:00
Eugene Rakhmatulin	74876dd442	Added recipes for nemotron-nano-3 and qwen3-coder-next	2026-02-09 14:33:35 -08:00
Eugene Rakhmatulin	3aa5e5dce4	Merge pull request #34	2026-02-09 14:28:30 -08:00
Raphael Amorim	6943a51ced	Adding tests and refactoring repeated methods	2026-02-09 17:21:32 -05:00
Raphael Amorim	d07ad5450f	Adding solo_only option to the recipe	2026-02-09 17:03:57 -05:00
Eugene Rakhmatulin	2923fe6ea5	Removed temp fastsafetensors patch	2026-02-09 10:21:14 -08:00
Eugene Rakhmatulin	06e8817f18	Triton 3.6.0 is now default	2026-02-08 22:38:31 -08:00
Eugene Rakhmatulin	d845cd0401	changed arch to 12.1a again	2026-02-08 14:18:12 -08:00
Eugene Rakhmatulin	5bf422a2ca	Merge branch 'main' into pytorch-base	2026-02-08 13:01:17 -08:00
Eugene Rakhmatulin	15c1506d0c	Merge pull request #32	2026-02-08 07:17:20 -08:00
Raphael Amorim	b7c3cdcfcb	Enhancement: add -- pass-through for arbitrary vLLM arguments Implements Unix-style pass-through allowing any vLLM argument to be passed after `--` separator. Arguments are appended verbatim to the generated vLLM command. Examples: ./run-recipe.py model --solo -- --load-format safetensors ./run-recipe.py model --solo -- --served-model-name my-api ./run-recipe.py model --solo -- -cc.cudagraph_mode=PIECEWISE Features: - Uses parse_known_args() to capture arguments after -- - Warns when extra args duplicate CLI overrides (--port, --tp, etc.) - Works in both solo and cluster modes Adds 10 integration tests covering: - --load-format, --served-model-name, equals syntax - Multiple arguments, empty --, cluster mode - Duplicate detection warnings for port/tp/gpu-mem Closes #30	2026-02-08 02:36:49 -05:00
Eugene Rakhmatulin	dfb300e51a	Merge branch 'main' into pytorch-base	2026-02-05 13:54:12 -08:00
Eugene Rakhmatulin	8cb956b972	Updated networking guide	2026-02-05 13:53:57 -08:00
Eugene Rakhmatulin	66210e641d	Merge branch 'main' into pytorch-base	2026-02-04 12:07:06 -08:00
Eugene Rakhmatulin	f139c4b55d	Updated tests	2026-02-04 12:06:30 -08:00
Eugene Rakhmatulin	c7d45157e0	Merge pull request #19	2026-02-04 12:03:20 -08:00
Eugene Rakhmatulin	ec987259a0	Recipes and Launch Script support	2026-02-04 12:01:53 -08:00
Eugene Rakhmatulin	ef6a5eca29	Merge branch 'main' into pr-19	2026-02-04 11:36:59 -08:00
Eugene Rakhmatulin	f7830636af	Cleaning up launch-cluster changes	2026-02-04 11:36:55 -08:00
Raphael Amorim	b1516f688a	fix: Allow PR tests from any branch and add manual trigger	2026-02-03 17:42:09 -05:00
Raphael Amorim	28ba6090fc	Adding suggestions from Eugr and unit tests	2026-02-03 17:32:59 -05:00
Eugene Rakhmatulin	d8e183cc9b	Merge branch 'apply-pr' into pytorch-base	2026-02-03 14:17:46 -08:00
Eugene Rakhmatulin	c42cc56d34	bugfix	2026-02-03 14:17:30 -08:00
Eugene Rakhmatulin	79e646e833	Merge branch 'apply-pr' into pytorch-base	2026-02-03 14:14:45 -08:00
Eugene Rakhmatulin	d7e9f17c2e	vLLM build-time PRs support	2026-02-03 14:14:11 -08:00
Eugene Rakhmatulin	1e5aa060b8	Updated README to include networking guide	2026-02-03 14:14:05 -08:00
Raphael Amorim	30f16f1d4e	feat: Add recipe-based one-click model deployment system Introduces a YAML recipe system for simplified model deployment: - run-recipe.py: Main script handling build, download, and launch - run-recipe.sh: Bash wrapper for dependency management - recipes/: Pre-configured recipes for common models - glm-4.7-flash-awq.yaml: GLM-4.7-Flash with AWQ quantization - glm-4.7-nvfp4.yaml: GLM-4.7 with NVFP4 (cluster-only) - minimax-m2-awq.yaml: MiniMax M2 with AWQ - openai-gpt-oss-120b.yaml: OpenAI GPT-OSS 120B with MXFP4 Key features: - Auto-discover cluster nodes with --discover, saves to .env - Load nodes from .env automatically on subsequent runs - cluster_only flag for models requiring multi-node setup - build_args field for Dockerfile selection (--pre-tf, --exp-mxfp4) - Solo mode auto-strips --distributed-executor-backend ray - --setup flag for full build + download + run workflow - --dry-run to preview execution without running Usage: ./run-recipe.sh --discover # Find and save cluster nodes ./run-recipe.sh glm-4.7-flash-awq --solo --setup ./run-recipe.sh glm-4.7-nvfp4 --setup # Uses nodes from .env	2026-02-03 16:09:12 -05:00
Eugene Rakhmatulin	ecf7f5f7b5	Merge branch 'main' into pytorch-base	2026-02-03 12:55:03 -08:00
Eugene Rakhmatulin	f8eb294c58	Updated README.md and added Networking Guide.	2026-02-03 12:54:38 -08:00
Eugene Rakhmatulin	4b9ab0de7c	Added ability to launch NGC container in the cluster	2026-02-02 16:57:04 -08:00
Eugene Rakhmatulin	997bf9ea0e	Merge branch 'main' into pytorch-base	2026-02-02 12:44:15 -08:00
Eugene Rakhmatulin	4634ee92a2	Added a mod for Nemotron Nano	2026-02-02 11:58:07 -08:00
Eugene Rakhmatulin	37953478f0	changed arch codes again to be in line with upcoming PR	2026-02-02 09:21:48 -08:00
Raphael Amorim	751bc5a47a	Adding sample profile and profile loader	2026-02-02 10:25:53 -05:00
Eugene Rakhmatulin	3c7f91081d	changed arch flags	2026-02-01 16:37:01 -08:00
Eugene Rakhmatulin	5f7d480801	Reverted Triton removal to use system triton package	2026-01-31 23:23:59 -08:00
Eugene Rakhmatulin	133ed9cfb9	bumped up MXFP4 base image version	2026-01-31 16:17:58 -08:00
Eugene Rakhmatulin	c81edce091	bumped up MXFP4 base image version	2026-01-31 16:12:33 -08:00
Eugene Rakhmatulin	9691eed1b0	Disabled Triton build for now	2026-01-31 00:10:52 -08:00
Eugene Rakhmatulin	7c61b4057c	Added Triton compilation to custom build	2026-01-30 23:44:20 -08:00
Eugene Rakhmatulin	0482435848	Restore previous wheels build	2026-01-30 18:43:39 -08:00
Eugene Rakhmatulin	a6d6bafa69	Merge branch 'main' into pytorch-base	2026-01-30 17:06:29 -08:00
Eugene Rakhmatulin	4a4b4e7610	Fixed a bug when solo mode failed on a standalone Spark without configured RoCE.	2026-01-30 16:39:11 -08:00
Eugene Rakhmatulin	a4b524625a	using "from scratch" build for wheels to reduce image size	2026-01-30 16:29:47 -08:00
Eugene Rakhmatulin	518dc0108b	moved deps buster	2026-01-30 15:25:54 -08:00

1 2 3 4

177 Commits