Eugene Rakhmatulin
|
45494688d1
|
Updated README, added NVFP4 fix
|
2026-03-30 11:45:40 -07:00 |
|
Eugene Rakhmatulin
|
a3201f8873
|
--flashinfer-ref / --apply-flashinfer-pr
|
2026-03-29 22:40:35 -07:00 |
|
Eugene Rakhmatulin
|
32674c2619
|
removed temporary patch as it causes more issues.
|
2026-03-28 17:49:17 -07:00 |
|
Eugene Rakhmatulin
|
d37217bad0
|
moved PR patch before the requirements patching
|
2026-03-28 09:22:19 -07:00 |
|
Eugene Rakhmatulin
|
e70c87b4f6
|
Added PR38423 (temp)
|
2026-03-28 08:50:54 -07:00 |
|
Eugene Rakhmatulin
|
51d69c5c17
|
commenting out non-applicable PRs
|
2026-03-27 16:15:54 -07:00 |
|
Eugene Rakhmatulin
|
e6ee108cdf
|
Temporary patch for NVFP4
|
2026-03-26 11:43:44 -07:00 |
|
Eugene Rakhmatulin
|
174de6f0a8
|
temporary patch for PR38126
|
2026-03-26 08:58:04 -07:00 |
|
Eugene Rakhmatulin
|
c4b078b868
|
Merge branch 'main' into 3-node
|
2026-03-24 22:21:25 -07:00 |
|
Drew Botwinick
|
8298c3d7f8
|
Merge remote-tracking branch 'upstream/main'
# Conflicts:
# Dockerfile
|
2026-03-24 15:41:09 -05:00 |
|
Eugene Rakhmatulin
|
f8c2653fd3
|
Quick fix for NCCL dependency
|
2026-03-23 23:20:59 -07:00 |
|
Eugene Rakhmatulin
|
990a7b3837
|
Use mesh-optimized NCCL
|
2026-03-23 15:43:18 -07:00 |
|
Eugene Rakhmatulin
|
7a54657abf
|
Revert "cuda 13.2 torch"
This reverts commit 926dd57a87.
|
2026-03-21 15:36:17 -07:00 |
|
Eugene Rakhmatulin
|
926dd57a87
|
cuda 13.2 torch
|
2026-03-21 15:15:01 -07:00 |
|
Eugene Rakhmatulin
|
6e8d85c914
|
cleanup
|
2026-03-21 15:12:12 -07:00 |
|
Drew Botwinick
|
d6e76f8e2f
|
add build metadata generation and include in Dockerfiles
|
2026-03-21 16:10:04 -05:00 |
|
Eugene Rakhmatulin
|
8385506c5e
|
Fixes
|
2026-03-20 23:51:21 -07:00 |
|
Eugene Rakhmatulin
|
8caebe3155
|
Reverting back to CUDA image + pytorch from wheels
|
2026-03-20 17:03:18 -07:00 |
|
Eugene Rakhmatulin
|
03b055d7f0
|
Major cluster orchestration refactoring to support running without Ray
|
2026-03-13 11:55:18 -07:00 |
|
Eugene Rakhmatulin
|
e225c709fb
|
Revert "fix: add temporary patch for CUDA graphs estimation" as it has been merged to main
This reverts commit 63b2a8dbed.
|
2026-03-09 09:46:50 -07:00 |
|
Eugene Rakhmatulin
|
63b2a8dbed
|
fix: add temporary patch for CUDA graphs estimation
|
2026-03-08 22:43:41 -07:00 |
|
Eugene Rakhmatulin
|
2d03bc138d
|
saving flashinfer and vllm commits in wheels directories
|
2026-03-05 14:41:25 -08:00 |
|
Eugene Rakhmatulin
|
bbd7db2813
|
revert bumping up base image
|
2026-03-04 07:29:53 -08:00 |
|
Eugene Rakhmatulin
|
fff1a24982
|
Rolling back base image
|
2026-03-04 07:19:43 -08:00 |
|
Eugene Rakhmatulin
|
ae19b66fdd
|
Bumped base image version
|
2026-03-03 23:31:51 -08:00 |
|
Eugene Rakhmatulin
|
5a3536b38e
|
Fixed a bug where updated tags would cause git fetch to fail
|
2026-02-24 20:59:54 -08:00 |
|
Eugene Rakhmatulin
|
3c27d521bb
|
Reverting another breaking vLLM PR, fixes #60
|
2026-02-23 09:51:45 -08:00 |
|
Eugene Rakhmatulin
|
c60c16e867
|
Temporary patch to reverse PR that fails builds
|
2026-02-18 16:20:20 -08:00 |
|
Eugene Rakhmatulin
|
f09c2c3ac8
|
Refactoring, updated README
|
2026-02-18 15:58:53 -08:00 |
|
Eugene Rakhmatulin
|
ec0f189256
|
Initial refactoring to enable separate wheel builds
|
2026-02-17 19:15:32 -08:00 |
|
Eugene Rakhmatulin
|
4214d4fefe
|
Caching cubins during build for reuse
|
2026-02-13 19:30:28 -08:00 |
|
Eugene Rakhmatulin
|
da4185cb12
|
Fixed an issue with fetching latest vLLM code
|
2026-02-11 22:35:49 -08:00 |
|
Eugene Rakhmatulin
|
3b1e49dcb0
|
Supporting other CUDA archs via --gpu-arch flag
|
2026-02-11 13:10:41 -08:00 |
|
Eugene Rakhmatulin
|
ace16f3a8f
|
Applied new fastsafetensors fix to mxfp4 build; disabled wheel builds by default
|
2026-02-09 23:47:06 -08:00 |
|
Eugene Rakhmatulin
|
2923fe6ea5
|
Removed temp fastsafetensors patch
|
2026-02-09 10:21:14 -08:00 |
|
Eugene Rakhmatulin
|
06e8817f18
|
Triton 3.6.0 is now default
|
2026-02-08 22:38:31 -08:00 |
|
Eugene Rakhmatulin
|
d845cd0401
|
changed arch to 12.1a again
|
2026-02-08 14:18:12 -08:00 |
|
Eugene Rakhmatulin
|
79e646e833
|
Merge branch 'apply-pr' into pytorch-base
|
2026-02-03 14:14:45 -08:00 |
|
Eugene Rakhmatulin
|
d7e9f17c2e
|
vLLM build-time PRs support
|
2026-02-03 14:14:11 -08:00 |
|
Eugene Rakhmatulin
|
37953478f0
|
changed arch codes again to be in line with upcoming PR
|
2026-02-02 09:21:48 -08:00 |
|
Eugene Rakhmatulin
|
3c7f91081d
|
changed arch flags
|
2026-02-01 16:37:01 -08:00 |
|
Eugene Rakhmatulin
|
5f7d480801
|
Reverted Triton removal to use system triton package
|
2026-01-31 23:23:59 -08:00 |
|
Eugene Rakhmatulin
|
9691eed1b0
|
Disabled Triton build for now
|
2026-01-31 00:10:52 -08:00 |
|
Eugene Rakhmatulin
|
7c61b4057c
|
Added Triton compilation to custom build
|
2026-01-30 23:44:20 -08:00 |
|
Eugene Rakhmatulin
|
518dc0108b
|
moved deps buster
|
2026-01-30 15:25:54 -08:00 |
|
Eugene Rakhmatulin
|
a13c7d3007
|
cosmetic changes
|
2026-01-30 13:26:57 -08:00 |
|
Eugene Rakhmatulin
|
7dd0642621
|
Reduced final image size
|
2026-01-30 13:16:55 -08:00 |
|
Eugene Rakhmatulin
|
be19675980
|
Fixed initial vllm source fetch if not using main branch
|
2026-01-30 11:24:51 -08:00 |
|
Eugene Rakhmatulin
|
af6d5eae32
|
Temporarily removing incompatible triton-kernels
|
2026-01-30 11:17:38 -08:00 |
|
Eugene Rakhmatulin
|
7d232a305a
|
Reverted to Torch 2.9.1 in the source build to address #24
|
2026-01-30 10:43:12 -08:00 |
|