Commit Graph

100 Commits

Author SHA1 Message Date
Eugene Rakhmatulin
5415c1fe9e Include a PR to fix broken torch bindings (vllm pr 40191) 2026-04-18 09:19:50 -07:00
Eugene Rakhmatulin
d49fac1b8b Re-enable flashinfer_cutlass 2026-04-16 16:40:56 -07:00
Eugene Rakhmatulin
76fbf0d0be Fix for broken MiniMax M2 parser 2026-04-15 16:31:50 -07:00
Tim Messerschmidt
2c13e1ce25 Add InstantTensor to runtime dependencies 2026-04-14 19:38:36 +02:00
Eugene Rakhmatulin
cf4cb35356 added new flashinfer build dependency 2026-04-13 08:47:34 -07:00
Eugene Rakhmatulin
b7c8616743 Pinned pytorch version 2026-04-11 11:54:46 -07:00
Eugene Rakhmatulin
8e8e850ef1 fix for new requirements structure 2026-04-10 20:14:47 -07:00
Eugene Rakhmatulin
fc08740fba Increased uv timeout 2026-04-10 19:38:38 -07:00
Eugene Rakhmatulin
49d6d9fefd Removed PR2927 as it's been merged 2026-04-03 16:56:00 -07:00
Eugene Rakhmatulin
4afca860a5 Fix broken compilation (PR 38919) 2026-04-03 10:22:10 -07:00
Eugene Rakhmatulin
44808f7018 Apply vLLM PR 35568 2026-04-02 17:13:54 -07:00
Eugene Rakhmatulin
a770865834 Updated PRs to apply 2026-04-01 08:31:34 -07:00
Eugene Rakhmatulin
3a3ab98b3e Temporarily added PR2897 to Dockerfile 2026-03-31 22:06:08 -07:00
Eugene Rakhmatulin
41c0ce2c9a Fixed FI PR 2026-03-30 14:25:42 -07:00
Eugene Rakhmatulin
45494688d1 Updated README, added NVFP4 fix 2026-03-30 11:45:40 -07:00
Eugene Rakhmatulin
a3201f8873 --flashinfer-ref / --apply-flashinfer-pr 2026-03-29 22:40:35 -07:00
Eugene Rakhmatulin
32674c2619 removed temporary patch as it causes more issues. 2026-03-28 17:49:17 -07:00
Eugene Rakhmatulin
d37217bad0 moved PR patch before the requirements patching 2026-03-28 09:22:19 -07:00
Eugene Rakhmatulin
e70c87b4f6 Added PR38423 (temp) 2026-03-28 08:50:54 -07:00
Eugene Rakhmatulin
51d69c5c17 commenting out non-applicable PRs 2026-03-27 16:15:54 -07:00
Eugene Rakhmatulin
e6ee108cdf Temporary patch for NVFP4 2026-03-26 11:43:44 -07:00
Eugene Rakhmatulin
174de6f0a8 temporary patch for PR38126 2026-03-26 08:58:04 -07:00
Eugene Rakhmatulin
c4b078b868 Merge branch 'main' into 3-node 2026-03-24 22:21:25 -07:00
Drew Botwinick
8298c3d7f8 Merge remote-tracking branch 'upstream/main'
# Conflicts:
#	Dockerfile
2026-03-24 15:41:09 -05:00
Eugene Rakhmatulin
f8c2653fd3 Quick fix for NCCL dependency 2026-03-23 23:20:59 -07:00
Eugene Rakhmatulin
990a7b3837 Use mesh-optimized NCCL 2026-03-23 15:43:18 -07:00
Eugene Rakhmatulin
7a54657abf Revert "cuda 13.2 torch"
This reverts commit 926dd57a87.
2026-03-21 15:36:17 -07:00
Eugene Rakhmatulin
926dd57a87 cuda 13.2 torch 2026-03-21 15:15:01 -07:00
Eugene Rakhmatulin
6e8d85c914 cleanup 2026-03-21 15:12:12 -07:00
Drew Botwinick
d6e76f8e2f add build metadata generation and include in Dockerfiles 2026-03-21 16:10:04 -05:00
Eugene Rakhmatulin
8385506c5e Fixes 2026-03-20 23:51:21 -07:00
Eugene Rakhmatulin
8caebe3155 Reverting back to CUDA image + pytorch from wheels 2026-03-20 17:03:18 -07:00
Eugene Rakhmatulin
03b055d7f0 Major cluster orchestration refactoring to support running without Ray 2026-03-13 11:55:18 -07:00
Eugene Rakhmatulin
e225c709fb Revert "fix: add temporary patch for CUDA graphs estimation" as it has been merged to main
This reverts commit 63b2a8dbed.
2026-03-09 09:46:50 -07:00
Eugene Rakhmatulin
63b2a8dbed fix: add temporary patch for CUDA graphs estimation 2026-03-08 22:43:41 -07:00
Eugene Rakhmatulin
2d03bc138d saving flashinfer and vllm commits in wheels directories 2026-03-05 14:41:25 -08:00
Eugene Rakhmatulin
bbd7db2813 revert bumping up base image 2026-03-04 07:29:53 -08:00
Eugene Rakhmatulin
fff1a24982 Rolling back base image 2026-03-04 07:19:43 -08:00
Eugene Rakhmatulin
ae19b66fdd Bumped base image version 2026-03-03 23:31:51 -08:00
Eugene Rakhmatulin
5a3536b38e Fixed a bug where updated tags would cause git fetch to fail 2026-02-24 20:59:54 -08:00
Eugene Rakhmatulin
3c27d521bb Reverting another breaking vLLM PR, fixes #60 2026-02-23 09:51:45 -08:00
Eugene Rakhmatulin
c60c16e867 Temporary patch to reverse PR that fails builds 2026-02-18 16:20:20 -08:00
Eugene Rakhmatulin
f09c2c3ac8 Refactoring, updated README 2026-02-18 15:58:53 -08:00
Eugene Rakhmatulin
ec0f189256 Initial refactoring to enable separate wheel builds 2026-02-17 19:15:32 -08:00
Eugene Rakhmatulin
4214d4fefe Caching cubins during build for reuse 2026-02-13 19:30:28 -08:00
Eugene Rakhmatulin
da4185cb12 Fixed an issue with fetching latest vLLM code 2026-02-11 22:35:49 -08:00
Eugene Rakhmatulin
3b1e49dcb0 Supporting other CUDA archs via --gpu-arch flag 2026-02-11 13:10:41 -08:00
Eugene Rakhmatulin
ace16f3a8f Applied new fastsafetensors fix to mxfp4 build; disabled wheel builds by default 2026-02-09 23:47:06 -08:00
Eugene Rakhmatulin
2923fe6ea5 Removed temp fastsafetensors patch 2026-02-09 10:21:14 -08:00
Eugene Rakhmatulin
06e8817f18 Triton 3.6.0 is now default 2026-02-08 22:38:31 -08:00