Commit Graph

394 Commits

Author SHA1 Message Date
Eugene Rakhmatulin
63b2a8dbed fix: add temporary patch for CUDA graphs estimation 2026-03-08 22:43:41 -07:00
eugr
9724619dbd Merge pull request #87 from SeraphimSerapis/fix_wheels_download
fix: skip empty lines in wheel download read loop
2026-03-07 09:34:31 -08:00
Eugene Rakhmatulin
d42c4199fa Unsloth chat template for qwen3.5 staging-current-1772875976 2026-03-06 23:35:18 -08:00
Tim Messerschmidt
b9fc32ec34 fix: skip empty lines in wheel download read loop
Add a guard to skip empty lines (e.g. trailing newlines) in the
while-read loop to prevent try_download_wheels from breaking on
unexpected blank input.
2026-03-07 05:06:12 +01:00
Eugene Rakhmatulin
9dc09bd04b Renamed recipe for qwen3.5-35b-a3b-fp8 to match others 2026-03-06 13:56:06 -08:00
eugr
e88426646b Merge pull request #76 from mmonad/fix-exec-arg-quoting
Fix shell quoting for exec command arguments
2026-03-06 13:45:53 -08:00
mariosaladino
f95beba566 Add -e/--env passthrough to run-recipe.py
Fixes #81. Allows passing environment variables (e.g. HF_TOKEN)
through to the container when launching via recipes, mirroring
the existing -e flag in launch-cluster.sh.

Usage: ./run-recipe.sh glm-4.7-flash-awq --solo -e HF_TOKEN=$HF_TOKEN
2026-03-06 21:50:29 +01:00
Olivier Paroz
eb8abcca7f Prevent 169.254.x.x fallback when setting fix IP address (#84)
* Prevent 169.254.x.x fallback when setting fix IP address

To force the use of the IP we've chosen to be assigned to the interface, it's safer to disable the fallback to avoid problems down the line

* Prevent 169.254.x.x fallback when setting fix IP address

To force the use of the static IP address we've chosen to be assigned to the interface, it's safer to disable the fallback to avoid problems down the line
2026-03-06 11:47:47 -08:00
eugr
d148d95a19 Merge pull request #80 from oliverjohnwilson/recipe-add_minimax-m2.5_qwen3.5-397b-a17B-fp8
added minimax-m2.5 and qwen3.5-397b-a17B-fp8 recipes to a recipes/4x-spark-cluster/ subdirectory
2026-03-06 11:46:37 -08:00
Eugene Rakhmatulin
5346372f14 More robust wheels check before download 2026-03-05 17:06:57 -08:00
Eugene Rakhmatulin
5f8f988d91 Merge branch 'main' of github.com:eugr/spark-vllm-docker 2026-03-05 16:29:00 -08:00
eugr
3fabd3fb1c Merge pull request #72 from erikvullings/main
Add Qwen35-35B-A3B recipe in FP8 format
2026-03-05 16:27:50 -08:00
Eugene Rakhmatulin
2d03bc138d saving flashinfer and vllm commits in wheels directories 2026-03-05 14:41:25 -08:00
Eugene Rakhmatulin
a749fcce87 Added a recipe for qwen3.5-122B-FP8 staging-current-1772696417 staging-current-1772696532 2026-03-04 16:49:39 -08:00
Eugene Rakhmatulin
505a060a7d vLLM prebuilt wheels support 2026-03-04 16:01:50 -08:00
Eugene Rakhmatulin
ca34ebcffc Merge branch 'main' into vllm-wheels 2026-03-04 15:59:16 -08:00
oliverjohnwilson
4303f8b6d0 added minimax-m2.5 and qwen3.5-397b-a17B-fp8 recipes to a recipes/4x-spark-cluster/ subdirectory 2026-03-04 16:01:37 -06:00
Eugene Rakhmatulin
2152ef127d Now can use prebuilt vLLM wheels 2026-03-04 13:33:32 -08:00
Eugene Rakhmatulin
19f06a0d16 Fixed a bug with checking whether we need to download remote wheels staging-current-1772668424 staging-current-1772668553 2026-03-04 13:00:40 -08:00
Eugene Rakhmatulin
bbd7db2813 revert bumping up base image staging-current-1772642670 staging-current-1772642791 2026-03-04 07:29:53 -08:00
L.B.R.
50b3ca60f3 Fix shell quoting for exec command arguments
Arguments with special characters (e.g. JSON strings) were passed
unquoted, causing breakage for commands like:
  --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

Use printf %q in launch-cluster.sh and shlex.quote() in run-recipe.py
to properly escape arguments.
2026-03-04 15:22:42 +00:00
Eugene Rakhmatulin
fff1a24982 Rolling back base image 2026-03-04 07:19:43 -08:00
Eugene Rakhmatulin
ae19b66fdd Bumped base image version 2026-03-03 23:31:51 -08:00
Erik Vullings
163f23d85b Update qwen35-35b-a3b-fp8.yaml
--max_num_batched_tokens is a default variable now, which can be overriden via the CLI
2026-03-03 12:46:12 +01:00
Eugene Rakhmatulin
7d8465fd9c Added recipe for qwen3.5-122b-int4-autoround, updated README staging-current-1772608818 staging-current-1772608894 staging-current-1772609005 2026-03-02 12:18:16 -08:00
Eugene Rakhmatulin
8f11e7e5ed Intel/Qwen3.5-122B-A10B-int4-AutoRound support via mods/fix-qwen3.5-autoround 2026-02-27 10:55:42 -08:00
Erik Vullings
e8f94d6b8b Add Qwen35-35B-A3B recipe in FP8 format 2026-02-27 17:46:06 +01:00
Eugene Rakhmatulin
df88997449 piping exec command to docker logs when running in the daemon mode. 2026-02-26 18:19:38 -08:00
Eugene Rakhmatulin
15888c407a Merge pull request #62 2026-02-26 15:24:42 -08:00
Eugene Rakhmatulin
c1c3b9d66a support for daemon mode with exec command 2026-02-26 15:23:08 -08:00
Eugene Rakhmatulin
e9aa411e6c Merge branch 'main' into pr-62 2026-02-26 14:57:32 -08:00
eugr
4593931421 Merge pull request #70 from hoesing/fix-rsync-path
Fix rsync failure if destination dir doesn't exist
2026-02-26 08:59:05 -08:00
J.J. Hoesing
358b4795b6 Add --mkpath to rsync args to handle the case where .cache/huggingface/hub doesn't already exist on the destination. 2026-02-26 03:12:34 -08:00
Eugene Rakhmatulin
dbd3d21fb8 allows $HF_HOME in hf-download.sh 2026-02-25 16:39:12 -08:00
Eugene Rakhmatulin
1c853b725e allows to use $HF_HOME as huggingface cache directory, closes #68 2026-02-25 16:38:04 -08:00
Eugene Rakhmatulin
5a3536b38e Fixed a bug where updated tags would cause git fetch to fail 2026-02-24 20:59:54 -08:00
Eugene Rakhmatulin
5ed2c23d0d Mod for Intel/Qwen3-Coder-Next-INT4-Autoround model 2026-02-24 18:24:42 -08:00
Drew Botwinick
a276a76be2 support daemon mode for ACTION == exec 2026-02-23 23:12:52 -06:00
Eugene Rakhmatulin
3c27d521bb Reverting another breaking vLLM PR, fixes #60 2026-02-23 09:51:45 -08:00
Eugene Rakhmatulin
4c8f90395b Changed reasoning parser in MInimax for better compatibility with modern clients (like coding tools). 2026-02-21 11:53:13 -08:00
Eugene Rakhmatulin
349a270c1e More robust handling of wheels downloads 2026-02-19 13:47:59 -08:00
Eugene Rakhmatulin
ad662f9bab Changed MXFP4 CUTLASS SHA 2026-02-18 18:20:15 -08:00
Eugene Rakhmatulin
b959818536 MXFP4 fix cache bug 2026-02-18 16:53:57 -08:00
Eugene Rakhmatulin
c60c16e867 Temporary patch to reverse PR that fails builds 2026-02-18 16:20:20 -08:00
Eugene Rakhmatulin
f09c2c3ac8 Refactoring, updated README 2026-02-18 15:58:53 -08:00
Eugene Rakhmatulin
8873a0d959 Handle failed downloads properly 2026-02-18 14:55:43 -08:00
Eugene Rakhmatulin
12fd8a4503 Merge branch 'flashinfer-gen' of gitlab.home.eugr.net:ai/spark-vllm into flashinfer-gen 2026-02-18 14:47:20 -08:00
Eugene Rakhmatulin
34fff7b3fb Download flashinfer wheels from releases 2026-02-18 14:46:01 -08:00
Eugene Rakhmatulin
a6fdf58a82 Merge branch 'main' into flashinfer-gen 2026-02-18 13:35:41 -08:00
Eugene Rakhmatulin
bd3f45f920 Updated MXFP4 build to use fresh repo references 2026-02-18 13:35:09 -08:00