Eugene Rakhmatulin
|
5f8f988d91
|
Merge branch 'main' of github.com:eugr/spark-vllm-docker
|
2026-03-05 16:29:00 -08:00 |
|
eugr
|
3fabd3fb1c
|
Merge pull request #72 from erikvullings/main
Add Qwen35-35B-A3B recipe in FP8 format
|
2026-03-05 16:27:50 -08:00 |
|
Eugene Rakhmatulin
|
2d03bc138d
|
saving flashinfer and vllm commits in wheels directories
|
2026-03-05 14:41:25 -08:00 |
|
Eugene Rakhmatulin
|
a749fcce87
|
Added a recipe for qwen3.5-122B-FP8
staging-current-1772696417
staging-current-1772696532
|
2026-03-04 16:49:39 -08:00 |
|
Eugene Rakhmatulin
|
505a060a7d
|
vLLM prebuilt wheels support
|
2026-03-04 16:01:50 -08:00 |
|
Eugene Rakhmatulin
|
ca34ebcffc
|
Merge branch 'main' into vllm-wheels
|
2026-03-04 15:59:16 -08:00 |
|
Eugene Rakhmatulin
|
2152ef127d
|
Now can use prebuilt vLLM wheels
|
2026-03-04 13:33:32 -08:00 |
|
Eugene Rakhmatulin
|
19f06a0d16
|
Fixed a bug with checking whether we need to download remote wheels
staging-current-1772668424
staging-current-1772668553
|
2026-03-04 13:00:40 -08:00 |
|
Eugene Rakhmatulin
|
bbd7db2813
|
revert bumping up base image
staging-current-1772642670
staging-current-1772642791
|
2026-03-04 07:29:53 -08:00 |
|
Eugene Rakhmatulin
|
fff1a24982
|
Rolling back base image
|
2026-03-04 07:19:43 -08:00 |
|
Eugene Rakhmatulin
|
ae19b66fdd
|
Bumped base image version
|
2026-03-03 23:31:51 -08:00 |
|
Erik Vullings
|
163f23d85b
|
Update qwen35-35b-a3b-fp8.yaml
--max_num_batched_tokens is a default variable now, which can be overriden via the CLI
|
2026-03-03 12:46:12 +01:00 |
|
Eugene Rakhmatulin
|
7d8465fd9c
|
Added recipe for qwen3.5-122b-int4-autoround, updated README
staging-current-1772608818
staging-current-1772608894
staging-current-1772609005
|
2026-03-02 12:18:16 -08:00 |
|
Eugene Rakhmatulin
|
8f11e7e5ed
|
Intel/Qwen3.5-122B-A10B-int4-AutoRound support via mods/fix-qwen3.5-autoround
|
2026-02-27 10:55:42 -08:00 |
|
Erik Vullings
|
e8f94d6b8b
|
Add Qwen35-35B-A3B recipe in FP8 format
|
2026-02-27 17:46:06 +01:00 |
|
Eugene Rakhmatulin
|
df88997449
|
piping exec command to docker logs when running in the daemon mode.
|
2026-02-26 18:19:38 -08:00 |
|
Eugene Rakhmatulin
|
15888c407a
|
Merge pull request #62
|
2026-02-26 15:24:42 -08:00 |
|
Eugene Rakhmatulin
|
c1c3b9d66a
|
support for daemon mode with exec command
|
2026-02-26 15:23:08 -08:00 |
|
Eugene Rakhmatulin
|
e9aa411e6c
|
Merge branch 'main' into pr-62
|
2026-02-26 14:57:32 -08:00 |
|
eugr
|
4593931421
|
Merge pull request #70 from hoesing/fix-rsync-path
Fix rsync failure if destination dir doesn't exist
|
2026-02-26 08:59:05 -08:00 |
|
J.J. Hoesing
|
358b4795b6
|
Add --mkpath to rsync args to handle the case where .cache/huggingface/hub doesn't already exist on the destination.
|
2026-02-26 03:12:34 -08:00 |
|
Eugene Rakhmatulin
|
dbd3d21fb8
|
allows $HF_HOME in hf-download.sh
|
2026-02-25 16:39:12 -08:00 |
|
Eugene Rakhmatulin
|
1c853b725e
|
allows to use $HF_HOME as huggingface cache directory, closes #68
|
2026-02-25 16:38:04 -08:00 |
|
Eugene Rakhmatulin
|
5a3536b38e
|
Fixed a bug where updated tags would cause git fetch to fail
|
2026-02-24 20:59:54 -08:00 |
|
Eugene Rakhmatulin
|
5ed2c23d0d
|
Mod for Intel/Qwen3-Coder-Next-INT4-Autoround model
|
2026-02-24 18:24:42 -08:00 |
|
Drew Botwinick
|
a276a76be2
|
support daemon mode for ACTION == exec
|
2026-02-23 23:12:52 -06:00 |
|
Eugene Rakhmatulin
|
3c27d521bb
|
Reverting another breaking vLLM PR, fixes #60
|
2026-02-23 09:51:45 -08:00 |
|
Eugene Rakhmatulin
|
4c8f90395b
|
Changed reasoning parser in MInimax for better compatibility with modern clients (like coding tools).
|
2026-02-21 11:53:13 -08:00 |
|
Eugene Rakhmatulin
|
349a270c1e
|
More robust handling of wheels downloads
|
2026-02-19 13:47:59 -08:00 |
|
Eugene Rakhmatulin
|
ad662f9bab
|
Changed MXFP4 CUTLASS SHA
|
2026-02-18 18:20:15 -08:00 |
|
Eugene Rakhmatulin
|
b959818536
|
MXFP4 fix cache bug
|
2026-02-18 16:53:57 -08:00 |
|
Eugene Rakhmatulin
|
c60c16e867
|
Temporary patch to reverse PR that fails builds
|
2026-02-18 16:20:20 -08:00 |
|
Eugene Rakhmatulin
|
f09c2c3ac8
|
Refactoring, updated README
|
2026-02-18 15:58:53 -08:00 |
|
Eugene Rakhmatulin
|
8873a0d959
|
Handle failed downloads properly
|
2026-02-18 14:55:43 -08:00 |
|
Eugene Rakhmatulin
|
12fd8a4503
|
Merge branch 'flashinfer-gen' of gitlab.home.eugr.net:ai/spark-vllm into flashinfer-gen
|
2026-02-18 14:47:20 -08:00 |
|
Eugene Rakhmatulin
|
34fff7b3fb
|
Download flashinfer wheels from releases
|
2026-02-18 14:46:01 -08:00 |
|
Eugene Rakhmatulin
|
a6fdf58a82
|
Merge branch 'main' into flashinfer-gen
|
2026-02-18 13:35:41 -08:00 |
|
Eugene Rakhmatulin
|
bd3f45f920
|
Updated MXFP4 build to use fresh repo references
|
2026-02-18 13:35:09 -08:00 |
|
Eugene Rakhmatulin
|
b06531f70b
|
Backup old wheels before rebuilding and restore on failure
|
2026-02-17 23:13:25 -08:00 |
|
Eugene Rakhmatulin
|
a49b89a0e5
|
Remove old wheels before rebuilding
|
2026-02-17 23:04:58 -08:00 |
|
Eugene Rakhmatulin
|
ec0f189256
|
Initial refactoring to enable separate wheel builds
|
2026-02-17 19:15:32 -08:00 |
|
Eugene Rakhmatulin
|
5b2313dddb
|
Changed KV type to fp8 in qwen3-coder-next recipe and reduced default context size to 131072 to ensure it all fits in a single Spark.
|
2026-02-17 13:07:54 -08:00 |
|
Eugene Rakhmatulin
|
0249f1fdde
|
Merge branch 'main' into privileged
|
2026-02-17 13:01:31 -08:00 |
|
Eugene Rakhmatulin
|
ef07046d51
|
Now using an opened PR for glm-4.7-flash crash fix in the mod
|
2026-02-17 12:45:17 -08:00 |
|
Eugene Rakhmatulin
|
6aafc9c7d3
|
Merge branch 'main' into privileged
|
2026-02-16 11:38:41 -08:00 |
|
Eugene Rakhmatulin
|
1e7f2d5640
|
Small fix for M2.5 recipe
|
2026-02-16 11:38:34 -08:00 |
|
Eugene Rakhmatulin
|
bd2085d783
|
Merge branch 'main' into privileged
|
2026-02-16 11:36:06 -08:00 |
|
Eugene Rakhmatulin
|
24f42be5cc
|
Added a recipe for MiniMax M2.5 AWQ
|
2026-02-16 11:35:53 -08:00 |
|
Eugene Rakhmatulin
|
88a5d09748
|
Merge branch 'main' into privileged
|
2026-02-16 09:29:09 -08:00 |
|
Eugene Rakhmatulin
|
c23aff91d3
|
Temporary fix for #38
|
2026-02-16 09:23:10 -08:00 |
|