Commit Graph

221 Commits

Author SHA1 Message Date
Eugene Rakhmatulin
fff1a24982 Rolling back base image 2026-03-04 07:19:43 -08:00
Eugene Rakhmatulin
ae19b66fdd Bumped base image version 2026-03-03 23:31:51 -08:00
Eugene Rakhmatulin
7d8465fd9c Added recipe for qwen3.5-122b-int4-autoround, updated README staging-current-1772608818 staging-current-1772608894 staging-current-1772609005 2026-03-02 12:18:16 -08:00
Eugene Rakhmatulin
8f11e7e5ed Intel/Qwen3.5-122B-A10B-int4-AutoRound support via mods/fix-qwen3.5-autoround 2026-02-27 10:55:42 -08:00
Eugene Rakhmatulin
df88997449 piping exec command to docker logs when running in the daemon mode. 2026-02-26 18:19:38 -08:00
Eugene Rakhmatulin
15888c407a Merge pull request #62 2026-02-26 15:24:42 -08:00
Eugene Rakhmatulin
c1c3b9d66a support for daemon mode with exec command 2026-02-26 15:23:08 -08:00
Eugene Rakhmatulin
e9aa411e6c Merge branch 'main' into pr-62 2026-02-26 14:57:32 -08:00
eugr
4593931421 Merge pull request #70 from hoesing/fix-rsync-path
Fix rsync failure if destination dir doesn't exist
2026-02-26 08:59:05 -08:00
J.J. Hoesing
358b4795b6 Add --mkpath to rsync args to handle the case where .cache/huggingface/hub doesn't already exist on the destination. 2026-02-26 03:12:34 -08:00
Eugene Rakhmatulin
dbd3d21fb8 allows $HF_HOME in hf-download.sh 2026-02-25 16:39:12 -08:00
Eugene Rakhmatulin
1c853b725e allows to use $HF_HOME as huggingface cache directory, closes #68 2026-02-25 16:38:04 -08:00
Eugene Rakhmatulin
5a3536b38e Fixed a bug where updated tags would cause git fetch to fail 2026-02-24 20:59:54 -08:00
Eugene Rakhmatulin
5ed2c23d0d Mod for Intel/Qwen3-Coder-Next-INT4-Autoround model 2026-02-24 18:24:42 -08:00
Drew Botwinick
a276a76be2 support daemon mode for ACTION == exec 2026-02-23 23:12:52 -06:00
Eugene Rakhmatulin
3c27d521bb Reverting another breaking vLLM PR, fixes #60 2026-02-23 09:51:45 -08:00
Eugene Rakhmatulin
4c8f90395b Changed reasoning parser in MInimax for better compatibility with modern clients (like coding tools). 2026-02-21 11:53:13 -08:00
Eugene Rakhmatulin
349a270c1e More robust handling of wheels downloads 2026-02-19 13:47:59 -08:00
Eugene Rakhmatulin
ad662f9bab Changed MXFP4 CUTLASS SHA 2026-02-18 18:20:15 -08:00
Eugene Rakhmatulin
b959818536 MXFP4 fix cache bug 2026-02-18 16:53:57 -08:00
Eugene Rakhmatulin
c60c16e867 Temporary patch to reverse PR that fails builds 2026-02-18 16:20:20 -08:00
Eugene Rakhmatulin
f09c2c3ac8 Refactoring, updated README 2026-02-18 15:58:53 -08:00
Eugene Rakhmatulin
8873a0d959 Handle failed downloads properly 2026-02-18 14:55:43 -08:00
Eugene Rakhmatulin
12fd8a4503 Merge branch 'flashinfer-gen' of gitlab.home.eugr.net:ai/spark-vllm into flashinfer-gen 2026-02-18 14:47:20 -08:00
Eugene Rakhmatulin
34fff7b3fb Download flashinfer wheels from releases 2026-02-18 14:46:01 -08:00
Eugene Rakhmatulin
a6fdf58a82 Merge branch 'main' into flashinfer-gen 2026-02-18 13:35:41 -08:00
Eugene Rakhmatulin
bd3f45f920 Updated MXFP4 build to use fresh repo references 2026-02-18 13:35:09 -08:00
Eugene Rakhmatulin
b06531f70b Backup old wheels before rebuilding and restore on failure 2026-02-17 23:13:25 -08:00
Eugene Rakhmatulin
a49b89a0e5 Remove old wheels before rebuilding 2026-02-17 23:04:58 -08:00
Eugene Rakhmatulin
ec0f189256 Initial refactoring to enable separate wheel builds 2026-02-17 19:15:32 -08:00
Eugene Rakhmatulin
5b2313dddb Changed KV type to fp8 in qwen3-coder-next recipe and reduced default context size to 131072 to ensure it all fits in a single Spark. 2026-02-17 13:07:54 -08:00
Eugene Rakhmatulin
0249f1fdde Merge branch 'main' into privileged 2026-02-17 13:01:31 -08:00
Eugene Rakhmatulin
ef07046d51 Now using an opened PR for glm-4.7-flash crash fix in the mod 2026-02-17 12:45:17 -08:00
Eugene Rakhmatulin
6aafc9c7d3 Merge branch 'main' into privileged 2026-02-16 11:38:41 -08:00
Eugene Rakhmatulin
1e7f2d5640 Small fix for M2.5 recipe 2026-02-16 11:38:34 -08:00
Eugene Rakhmatulin
bd2085d783 Merge branch 'main' into privileged 2026-02-16 11:36:06 -08:00
Eugene Rakhmatulin
24f42be5cc Added a recipe for MiniMax M2.5 AWQ 2026-02-16 11:35:53 -08:00
Eugene Rakhmatulin
88a5d09748 Merge branch 'main' into privileged 2026-02-16 09:29:09 -08:00
Eugene Rakhmatulin
c23aff91d3 Temporary fix for #38 2026-02-16 09:23:10 -08:00
Eugene Rakhmatulin
f886505436 Added --non-privileged flag to launch-cluster.sh 2026-02-15 00:12:06 -08:00
Eugene Rakhmatulin
4214d4fefe Caching cubins during build for reuse 2026-02-13 19:30:28 -08:00
Eugene Rakhmatulin
3470345624 Another fix for the Qwen mod as the slow PR was reversed in main 2026-02-13 13:46:00 -08:00
Eugene Rakhmatulin
c0524608c2 Qwen3-coder-next mod - use a new PR instead of reverting previous one 2026-02-13 12:03:44 -08:00
Eugene Rakhmatulin
701147b1eb Qwen3-Coder-Next fixes and updated recipe 2026-02-12 15:56:32 -08:00
Eugene Rakhmatulin
da4185cb12 Fixed an issue with fetching latest vLLM code 2026-02-11 22:35:49 -08:00
Eugene Rakhmatulin
3b1e49dcb0 Supporting other CUDA archs via --gpu-arch flag 2026-02-11 13:10:41 -08:00
Eugene Rakhmatulin
c6b245cfe8 Added prefix caching to nemotron recipe 2026-02-10 18:25:01 -08:00
Eugene Rakhmatulin
6d3f5dfd5c map flashinfer/torch/triton cache directories by default 2026-02-10 16:36:02 -08:00
Eugene Rakhmatulin
b990a1b8ac Fixed #37 2026-02-10 14:31:43 -08:00
Eugene Rakhmatulin
ace16f3a8f Applied new fastsafetensors fix to mxfp4 build; disabled wheel builds by default 2026-02-09 23:47:06 -08:00