Eugene Rakhmatulin
20a6699bf7
Add launch_cluster script for managing cluster nodes and actions
2025-12-18 13:11:13 -08:00
Eugene Rakhmatulin
1025243316
Added launch_cluster script to simplify launching cluster on nodes.
2025-12-18 13:10:57 -08:00
Christopher Owen
a13a9f6806
Limit build parallelism to reduce OOM situations
2025-12-18 13:36:35 +01:00
Eric Lewis
11355677f6
Add parallel copy option to build-and-copy.sh
...
Introduced the --copy-parallel flag to enable concurrent copying of Docker images to multiple hosts. Updated the README with usage instructions and details about the new option. Refactored the script to support both serial and parallel copy modes for improved efficiency.
2025-12-18 01:24:48 -05:00
Eric Lewis
e67abd5e6e
Add multi-host copy support to build-and-copy.sh
...
Updated build-and-copy.sh to support copying Docker images to multiple hosts using the new -c/--copy-to flag, which accepts space- or comma-separated host lists. The old --copy-to-host flag is retained as an alias for backward compatibility, and -h is now used for help. The README was updated to document these changes and provide new usage examples.
2025-12-18 00:32:45 -05:00
Eugene Rakhmatulin
e0f6cff132
Merge pull request #1
2025-12-16 21:32:42 -08:00
TeskaLabs Admin
f1abfb85b6
Bump of the version
2025-12-16 17:58:48 +00:00
Eugene Rakhmatulin
79f6a204d1
Update README.md
2025-12-15 09:51:49 -08:00
Eugene Rakhmatulin
0606b1b984
Refactor Triton and vLLM reference handling in Dockerfile and build script
2025-12-14 23:28:08 -08:00
eugr
4551795908
Fixed missing Infiniband dependency, added CuDNN
2025-12-14 21:49:50 -08:00
eugr
33720fc9d6
Use no-build-isolation for Triton Kernels build
2025-12-14 18:35:26 -08:00
eugr
dc614dc6ae
Separated Triton build into a dedicated phase for better caching
2025-12-14 10:32:28 -08:00
eugr
25f759fec8
Optimized triton caching
2025-12-14 09:26:10 -08:00
eugr
02f842e1fd
Updated README
2025-12-14 00:39:15 -08:00
eugr
e8a12da072
Build triton from source; add TRITON_SHA argument to specify triton release, and add timing statistics
2025-12-14 00:30:50 -08:00
eugr
a8217a1fd8
Improved dependency handling
2025-12-13 22:41:30 -08:00
eugr
cc3e73feb1
Improved caching
2025-12-13 21:34:57 -08:00
eugr
76a8e92c86
Multistage build with caching
2025-12-13 21:18:26 -08:00
eugr
295e1f2266
Removed MiniMax M2 temporary patch from Dockerfile; updated README.md
2025-12-11 13:24:57 -08:00
eugr
37c12cf9e4
Removed MiniMax M2 patch since the fix is merged into main
2025-12-11 13:23:30 -08:00
eugr
5fba205db4
Implemented a temporary patch for recently broken MiniMax-M2 (in builds after 12/10) for some quants.
2025-12-11 11:13:05 -08:00
eugr
9d351cd6d5
Updated README
2025-12-05 11:32:02 -08:00
eugr
270446be27
Add build-and-copy script for automated image building and deployment
2025-12-05 11:28:43 -08:00
eugr
b10ed739fe
formatting changes
2025-11-29 10:04:12 -08:00
eugr
6a66a4b66f
Added patch to allow fastsafetensors in cluster config
2025-11-26 21:25:04 -08:00
eugr
712637a348
Added second RoCE interface to examples
2025-11-26 19:53:37 -08:00
eugr
bdf16a0a34
Formatting
2025-11-26 14:02:15 -08:00
eugr
cf8e411ad2
Added benchmarking
2025-11-26 14:01:04 -08:00
eugr
676fa2ace9
Formatting fix
2025-11-26 13:52:30 -08:00
eugr
4f27899939
Added some details on networking
2025-11-26 13:50:39 -08:00
eugr
1a4bc1d7aa
Typo
2025-11-26 13:44:34 -08:00
eugr
2a7d31ad81
Updated README
2025-11-26 13:30:17 -08:00
eugr
549214e6ed
Added missing Infiniband and RDMA libraries
2025-11-25 16:14:08 -08:00
eugr
a96a3a2dac
Removed temporary patch for NVFP4 quants support as it's been merged into main
2025-11-25 12:48:58 -08:00
eugr
a93bd56389
Updated README
2025-11-24 21:44:01 -08:00
eugr
4c976375c5
Added missing dependencies; added dashboard support for Ray clusters
2025-11-24 21:13:06 -08:00
eugr
399948a725
Added missing modules for flashinfer
2025-11-24 17:02:04 -08:00
eugr
bd48032c45
Fixed typo in docker command in README
2025-11-24 16:34:19 -08:00
eugr
2cfa1db2cf
Updated README
2025-11-24 16:32:47 -08:00
eugr
6d6e4dfe50
Updated README
2025-11-24 16:23:00 -08:00
eugr
d3fd2e69fd
Updated Dockerfile with additional deps
2025-11-24 15:47:20 -08:00
eugr
f5141974ae
Fixed cluster script and small fix for Dockerfilewq
2025-11-24 15:45:04 -08:00
eugr
5c8feb086c
Updated README
2025-11-24 15:32:28 -08:00
eugr
3ecca4d2b7
Updated Dockerfile to include 2 levels of cache busters, added the cluster script and README.
2025-11-24 15:21:08 -08:00
eugr
0ad880e0fe
Added clustering script
2025-11-24 11:53:38 -08:00
eugr
4e95bf6fa6
Initial commit
2025-11-24 11:19:37 -08:00