Eugene Rakhmatulin
|
76fbf0d0be
|
Fix for broken MiniMax M2 parser
|
2026-04-15 16:31:50 -07:00 |
|
Tim Messerschmidt
|
2c13e1ce25
|
Add InstantTensor to runtime dependencies
|
2026-04-14 19:38:36 +02:00 |
|
Eugene Rakhmatulin
|
cf4cb35356
|
added new flashinfer build dependency
|
2026-04-13 08:47:34 -07:00 |
|
Eugene Rakhmatulin
|
b7c8616743
|
Pinned pytorch version
|
2026-04-11 11:54:46 -07:00 |
|
Eugene Rakhmatulin
|
8e8e850ef1
|
fix for new requirements structure
|
2026-04-10 20:14:47 -07:00 |
|
Eugene Rakhmatulin
|
fc08740fba
|
Increased uv timeout
|
2026-04-10 19:38:38 -07:00 |
|
Eugene Rakhmatulin
|
49d6d9fefd
|
Removed PR2927 as it's been merged
|
2026-04-03 16:56:00 -07:00 |
|
Eugene Rakhmatulin
|
4afca860a5
|
Fix broken compilation (PR 38919)
|
2026-04-03 10:22:10 -07:00 |
|
Eugene Rakhmatulin
|
44808f7018
|
Apply vLLM PR 35568
|
2026-04-02 17:13:54 -07:00 |
|
Eugene Rakhmatulin
|
a770865834
|
Updated PRs to apply
|
2026-04-01 08:31:34 -07:00 |
|
Eugene Rakhmatulin
|
3a3ab98b3e
|
Temporarily added PR2897 to Dockerfile
|
2026-03-31 22:06:08 -07:00 |
|
Eugene Rakhmatulin
|
41c0ce2c9a
|
Fixed FI PR
|
2026-03-30 14:25:42 -07:00 |
|
Eugene Rakhmatulin
|
45494688d1
|
Updated README, added NVFP4 fix
|
2026-03-30 11:45:40 -07:00 |
|
Eugene Rakhmatulin
|
a3201f8873
|
--flashinfer-ref / --apply-flashinfer-pr
|
2026-03-29 22:40:35 -07:00 |
|
Eugene Rakhmatulin
|
32674c2619
|
removed temporary patch as it causes more issues.
|
2026-03-28 17:49:17 -07:00 |
|
Eugene Rakhmatulin
|
d37217bad0
|
moved PR patch before the requirements patching
|
2026-03-28 09:22:19 -07:00 |
|
Eugene Rakhmatulin
|
e70c87b4f6
|
Added PR38423 (temp)
|
2026-03-28 08:50:54 -07:00 |
|
Eugene Rakhmatulin
|
51d69c5c17
|
commenting out non-applicable PRs
|
2026-03-27 16:15:54 -07:00 |
|
Eugene Rakhmatulin
|
e6ee108cdf
|
Temporary patch for NVFP4
|
2026-03-26 11:43:44 -07:00 |
|
Eugene Rakhmatulin
|
174de6f0a8
|
temporary patch for PR38126
|
2026-03-26 08:58:04 -07:00 |
|
Eugene Rakhmatulin
|
c4b078b868
|
Merge branch 'main' into 3-node
|
2026-03-24 22:21:25 -07:00 |
|
Drew Botwinick
|
8298c3d7f8
|
Merge remote-tracking branch 'upstream/main'
# Conflicts:
# Dockerfile
|
2026-03-24 15:41:09 -05:00 |
|
Eugene Rakhmatulin
|
f8c2653fd3
|
Quick fix for NCCL dependency
|
2026-03-23 23:20:59 -07:00 |
|
Eugene Rakhmatulin
|
990a7b3837
|
Use mesh-optimized NCCL
|
2026-03-23 15:43:18 -07:00 |
|
Eugene Rakhmatulin
|
7a54657abf
|
Revert "cuda 13.2 torch"
This reverts commit 926dd57a87.
|
2026-03-21 15:36:17 -07:00 |
|
Eugene Rakhmatulin
|
926dd57a87
|
cuda 13.2 torch
|
2026-03-21 15:15:01 -07:00 |
|
Eugene Rakhmatulin
|
6e8d85c914
|
cleanup
|
2026-03-21 15:12:12 -07:00 |
|
Drew Botwinick
|
d6e76f8e2f
|
add build metadata generation and include in Dockerfiles
|
2026-03-21 16:10:04 -05:00 |
|
Eugene Rakhmatulin
|
8385506c5e
|
Fixes
|
2026-03-20 23:51:21 -07:00 |
|
Eugene Rakhmatulin
|
8caebe3155
|
Reverting back to CUDA image + pytorch from wheels
|
2026-03-20 17:03:18 -07:00 |
|
Eugene Rakhmatulin
|
03b055d7f0
|
Major cluster orchestration refactoring to support running without Ray
|
2026-03-13 11:55:18 -07:00 |
|
Eugene Rakhmatulin
|
e225c709fb
|
Revert "fix: add temporary patch for CUDA graphs estimation" as it has been merged to main
This reverts commit 63b2a8dbed.
|
2026-03-09 09:46:50 -07:00 |
|
Eugene Rakhmatulin
|
63b2a8dbed
|
fix: add temporary patch for CUDA graphs estimation
|
2026-03-08 22:43:41 -07:00 |
|
Eugene Rakhmatulin
|
2d03bc138d
|
saving flashinfer and vllm commits in wheels directories
|
2026-03-05 14:41:25 -08:00 |
|
Eugene Rakhmatulin
|
bbd7db2813
|
revert bumping up base image
|
2026-03-04 07:29:53 -08:00 |
|
Eugene Rakhmatulin
|
fff1a24982
|
Rolling back base image
|
2026-03-04 07:19:43 -08:00 |
|
Eugene Rakhmatulin
|
ae19b66fdd
|
Bumped base image version
|
2026-03-03 23:31:51 -08:00 |
|
Eugene Rakhmatulin
|
5a3536b38e
|
Fixed a bug where updated tags would cause git fetch to fail
|
2026-02-24 20:59:54 -08:00 |
|
Eugene Rakhmatulin
|
3c27d521bb
|
Reverting another breaking vLLM PR, fixes #60
|
2026-02-23 09:51:45 -08:00 |
|
Eugene Rakhmatulin
|
c60c16e867
|
Temporary patch to reverse PR that fails builds
|
2026-02-18 16:20:20 -08:00 |
|
Eugene Rakhmatulin
|
f09c2c3ac8
|
Refactoring, updated README
|
2026-02-18 15:58:53 -08:00 |
|
Eugene Rakhmatulin
|
ec0f189256
|
Initial refactoring to enable separate wheel builds
|
2026-02-17 19:15:32 -08:00 |
|
Eugene Rakhmatulin
|
4214d4fefe
|
Caching cubins during build for reuse
|
2026-02-13 19:30:28 -08:00 |
|
Eugene Rakhmatulin
|
da4185cb12
|
Fixed an issue with fetching latest vLLM code
|
2026-02-11 22:35:49 -08:00 |
|
Eugene Rakhmatulin
|
3b1e49dcb0
|
Supporting other CUDA archs via --gpu-arch flag
|
2026-02-11 13:10:41 -08:00 |
|
Eugene Rakhmatulin
|
ace16f3a8f
|
Applied new fastsafetensors fix to mxfp4 build; disabled wheel builds by default
|
2026-02-09 23:47:06 -08:00 |
|
Eugene Rakhmatulin
|
2923fe6ea5
|
Removed temp fastsafetensors patch
|
2026-02-09 10:21:14 -08:00 |
|
Eugene Rakhmatulin
|
06e8817f18
|
Triton 3.6.0 is now default
|
2026-02-08 22:38:31 -08:00 |
|
Eugene Rakhmatulin
|
d845cd0401
|
changed arch to 12.1a again
|
2026-02-08 14:18:12 -08:00 |
|
Eugene Rakhmatulin
|
79e646e833
|
Merge branch 'apply-pr' into pytorch-base
|
2026-02-03 14:14:45 -08:00 |
|