Eugene Rakhmatulin
|
369283f655
|
Updated README.md with launch-cluster details.
|
2025-12-18 15:25:22 -08:00 |
|
Eugene Rakhmatulin
|
db5c443905
|
Enhance launch-cluster script with improved node detection and SSH scanning using netcat and Python
|
2025-12-18 14:52:23 -08:00 |
|
Eugene Rakhmatulin
|
6c04ebfca1
|
Refactor launch-cluster script to include cluster running checks and streamline start process for head and worker nodes
|
2025-12-18 14:50:26 -08:00 |
|
Eugene Rakhmatulin
|
f7a15bfaf5
|
Enhance launch-cluster script with improved SSH connectivity checks for worker nodes
|
2025-12-18 14:22:48 -08:00 |
|
Eugene Rakhmatulin
|
25b1d8eb4f
|
Enhance launch-cluster script with auto-detection for interfaces and nodes
|
2025-12-18 13:53:28 -08:00 |
|
Eugene Rakhmatulin
|
a1ed352635
|
renamed launch-cluster for consitency
|
2025-12-18 13:11:48 -08:00 |
|
Eugene Rakhmatulin
|
20a6699bf7
|
Add launch_cluster script for managing cluster nodes and actions
|
2025-12-18 13:11:13 -08:00 |
|
Eugene Rakhmatulin
|
1025243316
|
Added launch_cluster script to simplify launching cluster on nodes.
|
2025-12-18 13:10:57 -08:00 |
|
Eugene Rakhmatulin
|
e0f6cff132
|
Merge pull request #1
|
2025-12-16 21:32:42 -08:00 |
|
TeskaLabs Admin
|
f1abfb85b6
|
Bump of the version
|
2025-12-16 17:58:48 +00:00 |
|
Eugene Rakhmatulin
|
79f6a204d1
|
Update README.md
|
2025-12-15 09:51:49 -08:00 |
|
Eugene Rakhmatulin
|
0606b1b984
|
Refactor Triton and vLLM reference handling in Dockerfile and build script
|
2025-12-14 23:28:08 -08:00 |
|
eugr
|
4551795908
|
Fixed missing Infiniband dependency, added CuDNN
|
2025-12-14 21:49:50 -08:00 |
|
eugr
|
33720fc9d6
|
Use no-build-isolation for Triton Kernels build
|
2025-12-14 18:35:26 -08:00 |
|
eugr
|
dc614dc6ae
|
Separated Triton build into a dedicated phase for better caching
|
2025-12-14 10:32:28 -08:00 |
|
eugr
|
25f759fec8
|
Optimized triton caching
|
2025-12-14 09:26:10 -08:00 |
|
eugr
|
02f842e1fd
|
Updated README
|
2025-12-14 00:39:15 -08:00 |
|
eugr
|
e8a12da072
|
Build triton from source; add TRITON_SHA argument to specify triton release, and add timing statistics
|
2025-12-14 00:30:50 -08:00 |
|
eugr
|
a8217a1fd8
|
Improved dependency handling
|
2025-12-13 22:41:30 -08:00 |
|
eugr
|
cc3e73feb1
|
Improved caching
|
2025-12-13 21:34:57 -08:00 |
|
eugr
|
76a8e92c86
|
Multistage build with caching
|
2025-12-13 21:18:26 -08:00 |
|
eugr
|
295e1f2266
|
Removed MiniMax M2 temporary patch from Dockerfile; updated README.md
|
2025-12-11 13:24:57 -08:00 |
|
eugr
|
37c12cf9e4
|
Removed MiniMax M2 patch since the fix is merged into main
|
2025-12-11 13:23:30 -08:00 |
|
eugr
|
5fba205db4
|
Implemented a temporary patch for recently broken MiniMax-M2 (in builds after 12/10) for some quants.
|
2025-12-11 11:13:05 -08:00 |
|
eugr
|
9d351cd6d5
|
Updated README
|
2025-12-05 11:32:02 -08:00 |
|
eugr
|
270446be27
|
Add build-and-copy script for automated image building and deployment
|
2025-12-05 11:28:43 -08:00 |
|
eugr
|
b10ed739fe
|
formatting changes
|
2025-11-29 10:04:12 -08:00 |
|
eugr
|
6a66a4b66f
|
Added patch to allow fastsafetensors in cluster config
|
2025-11-26 21:25:04 -08:00 |
|
eugr
|
712637a348
|
Added second RoCE interface to examples
|
2025-11-26 19:53:37 -08:00 |
|
eugr
|
bdf16a0a34
|
Formatting
|
2025-11-26 14:02:15 -08:00 |
|
eugr
|
cf8e411ad2
|
Added benchmarking
|
2025-11-26 14:01:04 -08:00 |
|
eugr
|
676fa2ace9
|
Formatting fix
|
2025-11-26 13:52:30 -08:00 |
|
eugr
|
4f27899939
|
Added some details on networking
|
2025-11-26 13:50:39 -08:00 |
|
eugr
|
1a4bc1d7aa
|
Typo
|
2025-11-26 13:44:34 -08:00 |
|
eugr
|
2a7d31ad81
|
Updated README
|
2025-11-26 13:30:17 -08:00 |
|
eugr
|
549214e6ed
|
Added missing Infiniband and RDMA libraries
|
2025-11-25 16:14:08 -08:00 |
|
eugr
|
a96a3a2dac
|
Removed temporary patch for NVFP4 quants support as it's been merged into main
|
2025-11-25 12:48:58 -08:00 |
|
eugr
|
a93bd56389
|
Updated README
|
2025-11-24 21:44:01 -08:00 |
|
eugr
|
4c976375c5
|
Added missing dependencies; added dashboard support for Ray clusters
|
2025-11-24 21:13:06 -08:00 |
|
eugr
|
399948a725
|
Added missing modules for flashinfer
|
2025-11-24 17:02:04 -08:00 |
|
eugr
|
bd48032c45
|
Fixed typo in docker command in README
|
2025-11-24 16:34:19 -08:00 |
|
eugr
|
2cfa1db2cf
|
Updated README
|
2025-11-24 16:32:47 -08:00 |
|
eugr
|
6d6e4dfe50
|
Updated README
|
2025-11-24 16:23:00 -08:00 |
|
eugr
|
d3fd2e69fd
|
Updated Dockerfile with additional deps
|
2025-11-24 15:47:20 -08:00 |
|
eugr
|
f5141974ae
|
Fixed cluster script and small fix for Dockerfilewq
|
2025-11-24 15:45:04 -08:00 |
|
eugr
|
5c8feb086c
|
Updated README
|
2025-11-24 15:32:28 -08:00 |
|
eugr
|
3ecca4d2b7
|
Updated Dockerfile to include 2 levels of cache busters, added the cluster script and README.
|
2025-11-24 15:21:08 -08:00 |
|
eugr
|
0ad880e0fe
|
Added clustering script
|
2025-11-24 11:53:38 -08:00 |
|
eugr
|
4e95bf6fa6
|
Initial commit
|
2025-11-24 11:19:37 -08:00 |
|