Commit Graph

390 Commits

Author SHA1 Message Date
eugr
e3243bf555 Merge pull request #197 from mmonad/minimax-m2.7-awq-recipe
Add recipe for MiniMax-M2.7-AWQ
2026-04-25 19:26:43 -07:00
Eugene Rakhmatulin
43a00ed90f Fixed #205 2026-04-25 18:39:46 -07:00
eugr
ef9b0e50f4 Merge pull request #210 from Kaweees/main
Update gpu-mem-util-gb: patch with new vLLM default value
2026-04-25 10:00:52 -07:00
Miguel Villa Floran
c1e952de2e Update gpu-mem-util-gb: patch with new vLLM default value 2026-04-24 11:40:41 -07:00
Eugene Rakhmatulin
b13a3600d3 Remove a dependency 2026-04-23 07:47:23 -07:00
Eugene Rakhmatulin
7dea11bbf0 More robust handling of PRs 2026-04-22 13:18:12 -07:00
Eugene Rakhmatulin
c187912e23 Removed merged PRs 2026-04-21 09:47:26 -07:00
L.B.R.
caa28c8e12 Add recipe for MiniMax-M2.7-AWQ
Add a vLLM serving recipe for the MiniMax M2.7 model using
the cyankiwi/MiniMax-M2.7-AWQ-4bit quantization. Uses the
same minimax_m2 tool-call and reasoning parsers as the
existing M2 recipe, with Ray distributed backend on 2 GPUs.
2026-04-18 22:44:26 +01:00
Eugene Rakhmatulin
5415c1fe9e Include a PR to fix broken torch bindings (vllm pr 40191) 2026-04-18 09:19:50 -07:00
Eugene Rakhmatulin
d49fac1b8b Re-enable flashinfer_cutlass 2026-04-16 16:40:56 -07:00
Eugene Rakhmatulin
6b7f8dace6 Fixes #187 2026-04-15 22:32:14 -07:00
Eugene Rakhmatulin
76fbf0d0be Fix for broken MiniMax M2 parser 2026-04-15 16:31:50 -07:00
Eugene Rakhmatulin
b7830469be Updated README 2026-04-14 17:23:42 -07:00
Eugene Rakhmatulin
b50fa426c8 Merge pull request #190 2026-04-14 17:18:56 -07:00
Tim Messerschmidt
2c13e1ce25 Add InstantTensor to runtime dependencies 2026-04-14 19:38:36 +02:00
Eugene Rakhmatulin
c026c92bd0 Updated README 2026-04-13 11:27:57 -07:00
Eugene Rakhmatulin
cf4cb35356 added new flashinfer build dependency 2026-04-13 08:47:34 -07:00
Eugene Rakhmatulin
1ad85442ac Added a helper mod for Qwen3.5-397B recipe 2026-04-12 19:14:23 -07:00
Eugene Rakhmatulin
30919581ee Included .gitgnore in wheels 2026-04-11 17:02:39 -07:00
Eugene Rakhmatulin
b7c8616743 Pinned pytorch version 2026-04-11 11:54:46 -07:00
Eugene Rakhmatulin
8e8e850ef1 fix for new requirements structure 2026-04-10 20:14:47 -07:00
Eugene Rakhmatulin
fc08740fba Increased uv timeout 2026-04-10 19:38:38 -07:00
Eugene Rakhmatulin
288da8e911 Mod to fix Gemma4 tool parser 2026-04-04 16:48:07 -07:00
Eugene Rakhmatulin
7bc4e4ce5e Fixes #158 by adding build args to gemma4 recipe 2026-04-04 10:46:06 -07:00
Eugene Rakhmatulin
49d6d9fefd Removed PR2927 as it's been merged 2026-04-03 16:56:00 -07:00
Eugene Rakhmatulin
4afca860a5 Fix broken compilation (PR 38919) 2026-04-03 10:22:10 -07:00
Eugene Rakhmatulin
ed32612cdd A recipe for Gemma4-26B 2026-04-02 23:53:55 -07:00
Eugene Rakhmatulin
44808f7018 Apply vLLM PR 35568 2026-04-02 17:13:54 -07:00
Eugene Rakhmatulin
12caec228e switching gpt-oss-120b to solo only for now 2026-04-01 10:27:50 -07:00
Eugene Rakhmatulin
27eb35f08d Fixed 4x qwen recipe 2026-04-01 10:09:01 -07:00
eugr
3335540972 Merge branch 'pr-152' 2026-04-01 08:59:01 -07:00
eugr
ae25d64ac0 Changed CUTLASS ref for mxfp4 build 2026-04-01 08:58:31 -07:00
Eugene Rakhmatulin
a770865834 Updated PRs to apply 2026-04-01 08:31:34 -07:00
Artyom
7b47235463 Pin nvidia-nvshmem-cu13 to <3.6 in Dockerfile.mxfp4
nvidia-nvshmem-cu13 3.6.5 (released Mar 24) introduced a breaking
change — nvshmemi_device_state_d was removed from NVSHMEM headers,
which breaks FlashInfer AOT compilation of nvshmem_binding.cu.
2026-04-01 07:38:53 +02:00
Eugene Rakhmatulin
3a3ab98b3e Temporarily added PR2897 to Dockerfile 2026-03-31 22:06:08 -07:00
Eugene Rakhmatulin
23fb7dcc20 Merge branch '3-node-autodiscover' 2026-03-31 18:22:23 -07:00
Eugene Rakhmatulin
c4860b86a2 Updated README with 3-node support 2026-03-31 18:19:22 -07:00
Eugene Rakhmatulin
044557943c Bugfixes 2026-03-31 17:49:17 -07:00
Eugene Rakhmatulin
ead749239d Bugfix 2026-03-31 16:57:56 -07:00
Eugene Rakhmatulin
a889fed254 Updated README 2026-03-31 16:54:19 -07:00
Eugene Rakhmatulin
e89104d91b Always rerun discovery when --discover is specified 2026-03-31 16:25:05 -07:00
Eugene Rakhmatulin
15a04ada32 Bug fixes 2026-03-31 16:20:23 -07:00
Eugene Rakhmatulin
a467a7a0bd Updated README for 3-node 2026-03-31 13:47:04 -07:00
Eugene Rakhmatulin
48318380f9 Bugfix 2026-03-31 13:41:35 -07:00
Eugene Rakhmatulin
287d3c72e5 Fix for forced autodiscovery 2026-03-31 13:34:59 -07:00
Eugene Rakhmatulin
9370b2bb34 Don't start the cluster if only --setup/--discover is specified 2026-03-31 13:29:56 -07:00
Eugene Rakhmatulin
bb177383ff Bugfix in autodiscovery dedup 2026-03-31 12:46:15 -07:00
Eugene Rakhmatulin
7f0be29fcc Handle edge case when two sparks have both cables plugged and assigned IPs 2026-03-31 11:59:03 -07:00
Eugene Rakhmatulin
41c0ce2c9a Fixed FI PR 2026-03-30 14:25:42 -07:00
Eugene Rakhmatulin
45494688d1 Updated README, added NVFP4 fix 2026-03-30 11:45:40 -07:00