spark-vllm-docker

Author	SHA1	Message	Date
J.J. Hoesing	358b4795b6	Add `--mkpath` to rsync args to handle the case where `.cache/huggingface/hub` doesn't already exist on the destination.	2026-02-26 03:12:34 -08:00
Eugene Rakhmatulin	dbd3d21fb8	allows $HF_HOME in hf-download.sh	2026-02-25 16:39:12 -08:00
Eugene Rakhmatulin	1c853b725e	allows to use $HF_HOME as huggingface cache directory, closes #68	2026-02-25 16:38:04 -08:00
Eugene Rakhmatulin	5a3536b38e	Fixed a bug where updated tags would cause git fetch to fail	2026-02-24 20:59:54 -08:00
Eugene Rakhmatulin	5ed2c23d0d	Mod for Intel/Qwen3-Coder-Next-INT4-Autoround model	2026-02-24 18:24:42 -08:00
Eugene Rakhmatulin	3c27d521bb	Reverting another breaking vLLM PR, fixes #60	2026-02-23 09:51:45 -08:00
Eugene Rakhmatulin	4c8f90395b	Changed reasoning parser in MInimax for better compatibility with modern clients (like coding tools).	2026-02-21 11:53:13 -08:00
Eugene Rakhmatulin	349a270c1e	More robust handling of wheels downloads	2026-02-19 13:47:59 -08:00
Eugene Rakhmatulin	ad662f9bab	Changed MXFP4 CUTLASS SHA	2026-02-18 18:20:15 -08:00
Eugene Rakhmatulin	b959818536	MXFP4 fix cache bug	2026-02-18 16:53:57 -08:00
Eugene Rakhmatulin	c60c16e867	Temporary patch to reverse PR that fails builds	2026-02-18 16:20:20 -08:00
Eugene Rakhmatulin	f09c2c3ac8	Refactoring, updated README	2026-02-18 15:58:53 -08:00
Eugene Rakhmatulin	8873a0d959	Handle failed downloads properly	2026-02-18 14:55:43 -08:00
Eugene Rakhmatulin	12fd8a4503	Merge branch 'flashinfer-gen' of gitlab.home.eugr.net:ai/spark-vllm into flashinfer-gen	2026-02-18 14:47:20 -08:00
Eugene Rakhmatulin	34fff7b3fb	Download flashinfer wheels from releases	2026-02-18 14:46:01 -08:00
Eugene Rakhmatulin	a6fdf58a82	Merge branch 'main' into flashinfer-gen	2026-02-18 13:35:41 -08:00
Eugene Rakhmatulin	bd3f45f920	Updated MXFP4 build to use fresh repo references	2026-02-18 13:35:09 -08:00
Eugene Rakhmatulin	b06531f70b	Backup old wheels before rebuilding and restore on failure	2026-02-17 23:13:25 -08:00
Eugene Rakhmatulin	a49b89a0e5	Remove old wheels before rebuilding	2026-02-17 23:04:58 -08:00
Eugene Rakhmatulin	ec0f189256	Initial refactoring to enable separate wheel builds	2026-02-17 19:15:32 -08:00
Eugene Rakhmatulin	5b2313dddb	Changed KV type to fp8 in qwen3-coder-next recipe and reduced default context size to 131072 to ensure it all fits in a single Spark.	2026-02-17 13:07:54 -08:00
Eugene Rakhmatulin	0249f1fdde	Merge branch 'main' into privileged	2026-02-17 13:01:31 -08:00
Eugene Rakhmatulin	ef07046d51	Now using an opened PR for glm-4.7-flash crash fix in the mod	2026-02-17 12:45:17 -08:00
Eugene Rakhmatulin	6aafc9c7d3	Merge branch 'main' into privileged	2026-02-16 11:38:41 -08:00
Eugene Rakhmatulin	1e7f2d5640	Small fix for M2.5 recipe	2026-02-16 11:38:34 -08:00
Eugene Rakhmatulin	bd2085d783	Merge branch 'main' into privileged	2026-02-16 11:36:06 -08:00
Eugene Rakhmatulin	24f42be5cc	Added a recipe for MiniMax M2.5 AWQ	2026-02-16 11:35:53 -08:00
Eugene Rakhmatulin	88a5d09748	Merge branch 'main' into privileged	2026-02-16 09:29:09 -08:00
Eugene Rakhmatulin	c23aff91d3	Temporary fix for #38	2026-02-16 09:23:10 -08:00
Eugene Rakhmatulin	f886505436	Added --non-privileged flag to launch-cluster.sh	2026-02-15 00:12:06 -08:00
Eugene Rakhmatulin	4214d4fefe	Caching cubins during build for reuse	2026-02-13 19:30:28 -08:00
Eugene Rakhmatulin	3470345624	Another fix for the Qwen mod as the slow PR was reversed in main	2026-02-13 13:46:00 -08:00
Eugene Rakhmatulin	c0524608c2	Qwen3-coder-next mod - use a new PR instead of reverting previous one	2026-02-13 12:03:44 -08:00
Eugene Rakhmatulin	701147b1eb	Qwen3-Coder-Next fixes and updated recipe	2026-02-12 15:56:32 -08:00
Eugene Rakhmatulin	da4185cb12	Fixed an issue with fetching latest vLLM code	2026-02-11 22:35:49 -08:00
Eugene Rakhmatulin	3b1e49dcb0	Supporting other CUDA archs via `--gpu-arch` flag	2026-02-11 13:10:41 -08:00
Eugene Rakhmatulin	c6b245cfe8	Added prefix caching to nemotron recipe	2026-02-10 18:25:01 -08:00
Eugene Rakhmatulin	6d3f5dfd5c	map flashinfer/torch/triton cache directories by default	2026-02-10 16:36:02 -08:00
Eugene Rakhmatulin	b990a1b8ac	Fixed #37	2026-02-10 14:31:43 -08:00
Eugene Rakhmatulin	ace16f3a8f	Applied new fastsafetensors fix to mxfp4 build; disabled wheel builds by default	2026-02-09 23:47:06 -08:00
Eugene Rakhmatulin	74876dd442	Added recipes for nemotron-nano-3 and qwen3-coder-next	2026-02-09 14:33:35 -08:00
Eugene Rakhmatulin	3aa5e5dce4	Merge pull request #34	2026-02-09 14:28:30 -08:00
Raphael Amorim	6943a51ced	Adding tests and refactoring repeated methods	2026-02-09 17:21:32 -05:00
Raphael Amorim	d07ad5450f	Adding solo_only option to the recipe	2026-02-09 17:03:57 -05:00
Eugene Rakhmatulin	2923fe6ea5	Removed temp fastsafetensors patch	2026-02-09 10:21:14 -08:00
Eugene Rakhmatulin	06e8817f18	Triton 3.6.0 is now default	2026-02-08 22:38:31 -08:00
Eugene Rakhmatulin	d845cd0401	changed arch to 12.1a again	2026-02-08 14:18:12 -08:00
Eugene Rakhmatulin	5bf422a2ca	Merge branch 'main' into pytorch-base	2026-02-08 13:01:17 -08:00
Eugene Rakhmatulin	15c1506d0c	Merge pull request #32	2026-02-08 07:17:20 -08:00
Raphael Amorim	b7c3cdcfcb	Enhancement: add -- pass-through for arbitrary vLLM arguments Implements Unix-style pass-through allowing any vLLM argument to be passed after `--` separator. Arguments are appended verbatim to the generated vLLM command. Examples: ./run-recipe.py model --solo -- --load-format safetensors ./run-recipe.py model --solo -- --served-model-name my-api ./run-recipe.py model --solo -- -cc.cudagraph_mode=PIECEWISE Features: - Uses parse_known_args() to capture arguments after -- - Warns when extra args duplicate CLI overrides (--port, --tp, etc.) - Works in both solo and cluster modes Adds 10 integration tests covering: - --load-format, --served-model-name, equals syntax - Multiple arguments, empty --, cluster mode - Duplicate detection warnings for port/tp/gpu-mem Closes #30	2026-02-08 02:36:49 -05:00

1 2 3 4 5

211 Commits