Commit Graph

58 Commits

Author SHA1 Message Date
Eugene Rakhmatulin
f09c2c3ac8 Refactoring, updated README 2026-02-18 15:58:53 -08:00
Eugene Rakhmatulin
ec0f189256 Initial refactoring to enable separate wheel builds 2026-02-17 19:15:32 -08:00
Eugene Rakhmatulin
4214d4fefe Caching cubins during build for reuse 2026-02-13 19:30:28 -08:00
Eugene Rakhmatulin
da4185cb12 Fixed an issue with fetching latest vLLM code 2026-02-11 22:35:49 -08:00
Eugene Rakhmatulin
3b1e49dcb0 Supporting other CUDA archs via --gpu-arch flag 2026-02-11 13:10:41 -08:00
Eugene Rakhmatulin
ace16f3a8f Applied new fastsafetensors fix to mxfp4 build; disabled wheel builds by default 2026-02-09 23:47:06 -08:00
Eugene Rakhmatulin
2923fe6ea5 Removed temp fastsafetensors patch 2026-02-09 10:21:14 -08:00
Eugene Rakhmatulin
06e8817f18 Triton 3.6.0 is now default 2026-02-08 22:38:31 -08:00
Eugene Rakhmatulin
d845cd0401 changed arch to 12.1a again 2026-02-08 14:18:12 -08:00
Eugene Rakhmatulin
79e646e833 Merge branch 'apply-pr' into pytorch-base 2026-02-03 14:14:45 -08:00
Eugene Rakhmatulin
d7e9f17c2e vLLM build-time PRs support 2026-02-03 14:14:11 -08:00
Eugene Rakhmatulin
37953478f0 changed arch codes again to be in line with upcoming PR 2026-02-02 09:21:48 -08:00
Eugene Rakhmatulin
3c7f91081d changed arch flags 2026-02-01 16:37:01 -08:00
Eugene Rakhmatulin
5f7d480801 Reverted Triton removal to use system triton package 2026-01-31 23:23:59 -08:00
Eugene Rakhmatulin
9691eed1b0 Disabled Triton build for now 2026-01-31 00:10:52 -08:00
Eugene Rakhmatulin
7c61b4057c Added Triton compilation to custom build 2026-01-30 23:44:20 -08:00
Eugene Rakhmatulin
518dc0108b moved deps buster 2026-01-30 15:25:54 -08:00
Eugene Rakhmatulin
a13c7d3007 cosmetic changes 2026-01-30 13:26:57 -08:00
Eugene Rakhmatulin
7dd0642621 Reduced final image size 2026-01-30 13:16:55 -08:00
Eugene Rakhmatulin
be19675980 Fixed initial vllm source fetch if not using main branch 2026-01-30 11:24:51 -08:00
Eugene Rakhmatulin
af6d5eae32 Temporarily removing incompatible triton-kernels 2026-01-30 11:17:38 -08:00
Eugene Rakhmatulin
7d232a305a Reverted to Torch 2.9.1 in the source build to address #24 2026-01-30 10:43:12 -08:00
Eugene Rakhmatulin
458439706a Build flashinfer from source 2026-01-30 09:05:22 -08:00
Eugene Rakhmatulin
ef0f996df6 Bumped base image version; reverted Triton to 3.5.1 2026-01-29 23:14:43 -08:00
Eugene Rakhmatulin
0ac438b4dd Some optimizations 2026-01-29 22:08:05 -08:00
Eugene Rakhmatulin
46fecd172a added missing dependancy 2026-01-29 17:01:17 -08:00
Eugene Rakhmatulin
159460af0c Migrated dockerfiles to pytorch-base image 2026-01-29 15:47:07 -08:00
Eugene Rakhmatulin
e817f3dbec Updated Triton version to 3.6.0 2026-01-26 14:24:58 -08:00
Eugene Rakhmatulin
25a16ef6c2 Fixed #11 and #12 - added a new dependency for OpenCV 2026-01-19 12:07:15 -08:00
Eugene Rakhmatulin
1139a37324 Added transformers v5 support 2025-12-21 22:41:03 -08:00
Eugene Rakhmatulin
11db634aad Switch to uv in the main Dockerfile 2025-12-21 13:28:40 -08:00
Eugene Rakhmatulin
dfe426e912 Add support for pre-release FlashInfer packages in Docker builds 2025-12-20 23:13:26 -08:00
Eugene Rakhmatulin
a83200573a Enhance Dockerfile: limit ccache size, enable compression, and optimize git repo size 2025-12-20 15:29:37 -08:00
Eugene Rakhmatulin
fbb1bf73d5 Switching to flashinfer 0.6.x pre-release wheels 2025-12-20 13:28:06 -08:00
Christopher Owen
a13a9f6806 Limit build parallelism to reduce OOM situations 2025-12-18 13:36:35 +01:00
TeskaLabs Admin
f1abfb85b6 Bump of the version 2025-12-16 17:58:48 +00:00
Eugene Rakhmatulin
0606b1b984 Refactor Triton and vLLM reference handling in Dockerfile and build script 2025-12-14 23:28:08 -08:00
eugr
4551795908 Fixed missing Infiniband dependency, added CuDNN 2025-12-14 21:49:50 -08:00
eugr
33720fc9d6 Use no-build-isolation for Triton Kernels build 2025-12-14 18:35:26 -08:00
eugr
dc614dc6ae Separated Triton build into a dedicated phase for better caching 2025-12-14 10:32:28 -08:00
eugr
25f759fec8 Optimized triton caching 2025-12-14 09:26:10 -08:00
eugr
e8a12da072 Build triton from source; add TRITON_SHA argument to specify triton release, and add timing statistics 2025-12-14 00:30:50 -08:00
eugr
a8217a1fd8 Improved dependency handling 2025-12-13 22:41:30 -08:00
eugr
cc3e73feb1 Improved caching 2025-12-13 21:34:57 -08:00
eugr
76a8e92c86 Multistage build with caching 2025-12-13 21:18:26 -08:00
eugr
37c12cf9e4 Removed MiniMax M2 patch since the fix is merged into main 2025-12-11 13:23:30 -08:00
eugr
5fba205db4 Implemented a temporary patch for recently broken MiniMax-M2 (in builds after 12/10) for some quants. 2025-12-11 11:13:05 -08:00
eugr
b10ed739fe formatting changes 2025-11-29 10:04:12 -08:00
eugr
6a66a4b66f Added patch to allow fastsafetensors in cluster config 2025-11-26 21:25:04 -08:00
eugr
549214e6ed Added missing Infiniband and RDMA libraries 2025-11-25 16:14:08 -08:00