Eugene Rakhmatulin
5346372f14
More robust wheels check before download
2026-03-05 17:06:57 -08:00
Eugene Rakhmatulin
5f8f988d91
Merge branch 'main' of github.com:eugr/spark-vllm-docker
2026-03-05 16:29:00 -08:00
eugr
3fabd3fb1c
Merge pull request #72 from erikvullings/main
...
Add Qwen35-35B-A3B recipe in FP8 format
2026-03-05 16:27:50 -08:00
Eugene Rakhmatulin
2d03bc138d
saving flashinfer and vllm commits in wheels directories
2026-03-05 14:41:25 -08:00
Eugene Rakhmatulin
a749fcce87
Added a recipe for qwen3.5-122B-FP8
staging-current-1772696417
staging-current-1772696532
2026-03-04 16:49:39 -08:00
Eugene Rakhmatulin
505a060a7d
vLLM prebuilt wheels support
2026-03-04 16:01:50 -08:00
Eugene Rakhmatulin
ca34ebcffc
Merge branch 'main' into vllm-wheels
2026-03-04 15:59:16 -08:00
oliverjohnwilson
4303f8b6d0
added minimax-m2.5 and qwen3.5-397b-a17B-fp8 recipes to a recipes/4x-spark-cluster/ subdirectory
2026-03-04 16:01:37 -06:00
Eugene Rakhmatulin
2152ef127d
Now can use prebuilt vLLM wheels
2026-03-04 13:33:32 -08:00
Eugene Rakhmatulin
19f06a0d16
Fixed a bug with checking whether we need to download remote wheels
staging-current-1772668424
staging-current-1772668553
2026-03-04 13:00:40 -08:00
Eugene Rakhmatulin
bbd7db2813
revert bumping up base image
staging-current-1772642670
staging-current-1772642791
2026-03-04 07:29:53 -08:00
L.B.R.
50b3ca60f3
Fix shell quoting for exec command arguments
...
Arguments with special characters (e.g. JSON strings) were passed
unquoted, causing breakage for commands like:
--speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
Use printf %q in launch-cluster.sh and shlex.quote() in run-recipe.py
to properly escape arguments.
2026-03-04 15:22:42 +00:00
Eugene Rakhmatulin
fff1a24982
Rolling back base image
2026-03-04 07:19:43 -08:00
Eugene Rakhmatulin
ae19b66fdd
Bumped base image version
2026-03-03 23:31:51 -08:00
Erik Vullings
163f23d85b
Update qwen35-35b-a3b-fp8.yaml
...
--max_num_batched_tokens is a default variable now, which can be overriden via the CLI
2026-03-03 12:46:12 +01:00
Eugene Rakhmatulin
7d8465fd9c
Added recipe for qwen3.5-122b-int4-autoround, updated README
staging-current-1772608818
staging-current-1772608894
staging-current-1772609005
2026-03-02 12:18:16 -08:00
Eugene Rakhmatulin
8f11e7e5ed
Intel/Qwen3.5-122B-A10B-int4-AutoRound support via mods/fix-qwen3.5-autoround
2026-02-27 10:55:42 -08:00
Erik Vullings
e8f94d6b8b
Add Qwen35-35B-A3B recipe in FP8 format
2026-02-27 17:46:06 +01:00
Eugene Rakhmatulin
df88997449
piping exec command to docker logs when running in the daemon mode.
2026-02-26 18:19:38 -08:00
Eugene Rakhmatulin
15888c407a
Merge pull request #62
2026-02-26 15:24:42 -08:00
Eugene Rakhmatulin
c1c3b9d66a
support for daemon mode with exec command
2026-02-26 15:23:08 -08:00
Eugene Rakhmatulin
e9aa411e6c
Merge branch 'main' into pr-62
2026-02-26 14:57:32 -08:00
eugr
4593931421
Merge pull request #70 from hoesing/fix-rsync-path
...
Fix rsync failure if destination dir doesn't exist
2026-02-26 08:59:05 -08:00
J.J. Hoesing
358b4795b6
Add --mkpath to rsync args to handle the case where .cache/huggingface/hub doesn't already exist on the destination.
2026-02-26 03:12:34 -08:00
Eugene Rakhmatulin
dbd3d21fb8
allows $HF_HOME in hf-download.sh
2026-02-25 16:39:12 -08:00
Eugene Rakhmatulin
1c853b725e
allows to use $HF_HOME as huggingface cache directory, closes #68
2026-02-25 16:38:04 -08:00
Eugene Rakhmatulin
5a3536b38e
Fixed a bug where updated tags would cause git fetch to fail
2026-02-24 20:59:54 -08:00
Eugene Rakhmatulin
5ed2c23d0d
Mod for Intel/Qwen3-Coder-Next-INT4-Autoround model
2026-02-24 18:24:42 -08:00
Drew Botwinick
a276a76be2
support daemon mode for ACTION == exec
2026-02-23 23:12:52 -06:00
Eugene Rakhmatulin
3c27d521bb
Reverting another breaking vLLM PR, fixes #60
2026-02-23 09:51:45 -08:00
Eugene Rakhmatulin
4c8f90395b
Changed reasoning parser in MInimax for better compatibility with modern clients (like coding tools).
2026-02-21 11:53:13 -08:00
Eugene Rakhmatulin
349a270c1e
More robust handling of wheels downloads
2026-02-19 13:47:59 -08:00
Eugene Rakhmatulin
ad662f9bab
Changed MXFP4 CUTLASS SHA
2026-02-18 18:20:15 -08:00
Eugene Rakhmatulin
b959818536
MXFP4 fix cache bug
2026-02-18 16:53:57 -08:00
Eugene Rakhmatulin
c60c16e867
Temporary patch to reverse PR that fails builds
2026-02-18 16:20:20 -08:00
Eugene Rakhmatulin
f09c2c3ac8
Refactoring, updated README
2026-02-18 15:58:53 -08:00
Eugene Rakhmatulin
8873a0d959
Handle failed downloads properly
2026-02-18 14:55:43 -08:00
Eugene Rakhmatulin
12fd8a4503
Merge branch 'flashinfer-gen' of gitlab.home.eugr.net:ai/spark-vllm into flashinfer-gen
2026-02-18 14:47:20 -08:00
Eugene Rakhmatulin
34fff7b3fb
Download flashinfer wheels from releases
2026-02-18 14:46:01 -08:00
Eugene Rakhmatulin
a6fdf58a82
Merge branch 'main' into flashinfer-gen
2026-02-18 13:35:41 -08:00
Eugene Rakhmatulin
bd3f45f920
Updated MXFP4 build to use fresh repo references
2026-02-18 13:35:09 -08:00
Eugene Rakhmatulin
b06531f70b
Backup old wheels before rebuilding and restore on failure
2026-02-17 23:13:25 -08:00
Eugene Rakhmatulin
a49b89a0e5
Remove old wheels before rebuilding
2026-02-17 23:04:58 -08:00
Eugene Rakhmatulin
ec0f189256
Initial refactoring to enable separate wheel builds
2026-02-17 19:15:32 -08:00
Eugene Rakhmatulin
5b2313dddb
Changed KV type to fp8 in qwen3-coder-next recipe and reduced default context size to 131072 to ensure it all fits in a single Spark.
2026-02-17 13:07:54 -08:00
Eugene Rakhmatulin
0249f1fdde
Merge branch 'main' into privileged
2026-02-17 13:01:31 -08:00
Eugene Rakhmatulin
ef07046d51
Now using an opened PR for glm-4.7-flash crash fix in the mod
2026-02-17 12:45:17 -08:00
Eugene Rakhmatulin
6aafc9c7d3
Merge branch 'main' into privileged
2026-02-16 11:38:41 -08:00
Eugene Rakhmatulin
1e7f2d5640
Small fix for M2.5 recipe
2026-02-16 11:38:34 -08:00
Eugene Rakhmatulin
bd2085d783
Merge branch 'main' into privileged
2026-02-16 11:36:06 -08:00