Eugene Rakhmatulin
|
990a7b3837
|
Use mesh-optimized NCCL
|
2026-03-23 15:43:18 -07:00 |
|
Eugene Rakhmatulin
|
7a54657abf
|
Revert "cuda 13.2 torch"
This reverts commit 926dd57a87.
|
2026-03-21 15:36:17 -07:00 |
|
Eugene Rakhmatulin
|
926dd57a87
|
cuda 13.2 torch
|
2026-03-21 15:15:01 -07:00 |
|
Eugene Rakhmatulin
|
6e8d85c914
|
cleanup
|
2026-03-21 15:12:12 -07:00 |
|
Eugene Rakhmatulin
|
8385506c5e
|
Fixes
|
2026-03-20 23:51:21 -07:00 |
|
Eugene Rakhmatulin
|
8caebe3155
|
Reverting back to CUDA image + pytorch from wheels
|
2026-03-20 17:03:18 -07:00 |
|
Eugene Rakhmatulin
|
03b055d7f0
|
Major cluster orchestration refactoring to support running without Ray
|
2026-03-13 11:55:18 -07:00 |
|
Eugene Rakhmatulin
|
e225c709fb
|
Revert "fix: add temporary patch for CUDA graphs estimation" as it has been merged to main
This reverts commit 63b2a8dbed.
|
2026-03-09 09:46:50 -07:00 |
|
Eugene Rakhmatulin
|
63b2a8dbed
|
fix: add temporary patch for CUDA graphs estimation
|
2026-03-08 22:43:41 -07:00 |
|
Eugene Rakhmatulin
|
2d03bc138d
|
saving flashinfer and vllm commits in wheels directories
|
2026-03-05 14:41:25 -08:00 |
|
Eugene Rakhmatulin
|
bbd7db2813
|
revert bumping up base image
|
2026-03-04 07:29:53 -08:00 |
|
Eugene Rakhmatulin
|
fff1a24982
|
Rolling back base image
|
2026-03-04 07:19:43 -08:00 |
|
Eugene Rakhmatulin
|
ae19b66fdd
|
Bumped base image version
|
2026-03-03 23:31:51 -08:00 |
|
Eugene Rakhmatulin
|
5a3536b38e
|
Fixed a bug where updated tags would cause git fetch to fail
|
2026-02-24 20:59:54 -08:00 |
|
Eugene Rakhmatulin
|
3c27d521bb
|
Reverting another breaking vLLM PR, fixes #60
|
2026-02-23 09:51:45 -08:00 |
|
Eugene Rakhmatulin
|
c60c16e867
|
Temporary patch to reverse PR that fails builds
|
2026-02-18 16:20:20 -08:00 |
|
Eugene Rakhmatulin
|
f09c2c3ac8
|
Refactoring, updated README
|
2026-02-18 15:58:53 -08:00 |
|
Eugene Rakhmatulin
|
ec0f189256
|
Initial refactoring to enable separate wheel builds
|
2026-02-17 19:15:32 -08:00 |
|
Eugene Rakhmatulin
|
4214d4fefe
|
Caching cubins during build for reuse
|
2026-02-13 19:30:28 -08:00 |
|
Eugene Rakhmatulin
|
da4185cb12
|
Fixed an issue with fetching latest vLLM code
|
2026-02-11 22:35:49 -08:00 |
|
Eugene Rakhmatulin
|
3b1e49dcb0
|
Supporting other CUDA archs via --gpu-arch flag
|
2026-02-11 13:10:41 -08:00 |
|
Eugene Rakhmatulin
|
ace16f3a8f
|
Applied new fastsafetensors fix to mxfp4 build; disabled wheel builds by default
|
2026-02-09 23:47:06 -08:00 |
|
Eugene Rakhmatulin
|
2923fe6ea5
|
Removed temp fastsafetensors patch
|
2026-02-09 10:21:14 -08:00 |
|
Eugene Rakhmatulin
|
06e8817f18
|
Triton 3.6.0 is now default
|
2026-02-08 22:38:31 -08:00 |
|
Eugene Rakhmatulin
|
d845cd0401
|
changed arch to 12.1a again
|
2026-02-08 14:18:12 -08:00 |
|
Eugene Rakhmatulin
|
79e646e833
|
Merge branch 'apply-pr' into pytorch-base
|
2026-02-03 14:14:45 -08:00 |
|
Eugene Rakhmatulin
|
d7e9f17c2e
|
vLLM build-time PRs support
|
2026-02-03 14:14:11 -08:00 |
|
Eugene Rakhmatulin
|
37953478f0
|
changed arch codes again to be in line with upcoming PR
|
2026-02-02 09:21:48 -08:00 |
|
Eugene Rakhmatulin
|
3c7f91081d
|
changed arch flags
|
2026-02-01 16:37:01 -08:00 |
|
Eugene Rakhmatulin
|
5f7d480801
|
Reverted Triton removal to use system triton package
|
2026-01-31 23:23:59 -08:00 |
|
Eugene Rakhmatulin
|
9691eed1b0
|
Disabled Triton build for now
|
2026-01-31 00:10:52 -08:00 |
|
Eugene Rakhmatulin
|
7c61b4057c
|
Added Triton compilation to custom build
|
2026-01-30 23:44:20 -08:00 |
|
Eugene Rakhmatulin
|
518dc0108b
|
moved deps buster
|
2026-01-30 15:25:54 -08:00 |
|
Eugene Rakhmatulin
|
a13c7d3007
|
cosmetic changes
|
2026-01-30 13:26:57 -08:00 |
|
Eugene Rakhmatulin
|
7dd0642621
|
Reduced final image size
|
2026-01-30 13:16:55 -08:00 |
|
Eugene Rakhmatulin
|
be19675980
|
Fixed initial vllm source fetch if not using main branch
|
2026-01-30 11:24:51 -08:00 |
|
Eugene Rakhmatulin
|
af6d5eae32
|
Temporarily removing incompatible triton-kernels
|
2026-01-30 11:17:38 -08:00 |
|
Eugene Rakhmatulin
|
7d232a305a
|
Reverted to Torch 2.9.1 in the source build to address #24
|
2026-01-30 10:43:12 -08:00 |
|
Eugene Rakhmatulin
|
458439706a
|
Build flashinfer from source
|
2026-01-30 09:05:22 -08:00 |
|
Eugene Rakhmatulin
|
ef0f996df6
|
Bumped base image version; reverted Triton to 3.5.1
|
2026-01-29 23:14:43 -08:00 |
|
Eugene Rakhmatulin
|
0ac438b4dd
|
Some optimizations
|
2026-01-29 22:08:05 -08:00 |
|
Eugene Rakhmatulin
|
46fecd172a
|
added missing dependancy
|
2026-01-29 17:01:17 -08:00 |
|
Eugene Rakhmatulin
|
159460af0c
|
Migrated dockerfiles to pytorch-base image
|
2026-01-29 15:47:07 -08:00 |
|
Eugene Rakhmatulin
|
e817f3dbec
|
Updated Triton version to 3.6.0
|
2026-01-26 14:24:58 -08:00 |
|
Eugene Rakhmatulin
|
25a16ef6c2
|
Fixed #11 and #12 - added a new dependency for OpenCV
|
2026-01-19 12:07:15 -08:00 |
|
Eugene Rakhmatulin
|
1139a37324
|
Added transformers v5 support
|
2025-12-21 22:41:03 -08:00 |
|
Eugene Rakhmatulin
|
11db634aad
|
Switch to uv in the main Dockerfile
|
2025-12-21 13:28:40 -08:00 |
|
Eugene Rakhmatulin
|
dfe426e912
|
Add support for pre-release FlashInfer packages in Docker builds
|
2025-12-20 23:13:26 -08:00 |
|
Eugene Rakhmatulin
|
a83200573a
|
Enhance Dockerfile: limit ccache size, enable compression, and optimize git repo size
|
2025-12-20 15:29:37 -08:00 |
|
Eugene Rakhmatulin
|
fbb1bf73d5
|
Switching to flashinfer 0.6.x pre-release wheels
|
2025-12-20 13:28:06 -08:00 |
|