Eugene Rakhmatulin
1464b0dc8f
Display image name in launch-cluster.sh output
2025-12-21 22:44:01 -08:00
Eugene Rakhmatulin
786a50c5c7
Updated README
2025-12-21 22:41:48 -08:00
Eugene Rakhmatulin
1139a37324
Added transformers v5 support
2025-12-21 22:41:03 -08:00
Eugene Rakhmatulin
c37053adf6
Updated README
2025-12-21 14:57:35 -08:00
Eugene Rakhmatulin
82802f0cad
Added Quickstart section to README
2025-12-21 14:53:05 -08:00
Eugene Rakhmatulin
11db634aad
Switch to uv in the main Dockerfile
2025-12-21 13:28:40 -08:00
Eugene Rakhmatulin
bbd3469549
Support vLLM release wheels
2025-12-21 11:15:52 -08:00
Eugene Rakhmatulin
2aa545a810
Added PSA about build cache
2025-12-21 00:49:59 -08:00
Eugene Rakhmatulin
63a1a6a97c
Update README to reflect reduced build time and container size for vLLM
2025-12-20 23:16:12 -08:00
Eugene Rakhmatulin
dfe426e912
Add support for pre-release FlashInfer packages in Docker builds
2025-12-20 23:13:26 -08:00
Eugene Rakhmatulin
1b3968fe98
Merge branch 'flashinfer-0.6.0-pre'
2025-12-20 23:02:58 -08:00
Eugene Rakhmatulin
9f35dbdd2d
Reverted back to release flashinfer
2025-12-20 23:01:49 -08:00
Eugene Rakhmatulin
d5d85aaac7
Added optional flashinfer packages, using pre-release flashinfer
2025-12-20 22:56:40 -08:00
Eugene Rakhmatulin
76988e0c75
Added --use-wheels to use precompiled vLLM wheels instead of compiling from the source
2025-12-20 20:25:07 -08:00
Eugene Rakhmatulin
a83200573a
Enhance Dockerfile: limit ccache size, enable compression, and optimize git repo size
2025-12-20 15:29:37 -08:00
Eugene Rakhmatulin
fbb1bf73d5
Switching to flashinfer 0.6.x pre-release wheels
2025-12-20 13:28:06 -08:00
Eugene Rakhmatulin
f075801c59
Fixed launch_cluster bug introduced by refactoring
2025-12-19 10:51:50 -08:00
Eugene Rakhmatulin
0cac77c286
Fixed contributor username
2025-12-19 10:41:03 -08:00
Eugene Rakhmatulin
3eb57a6d49
Updated README - autodiscovery in copy ops
2025-12-19 10:39:28 -08:00
Eugene Rakhmatulin
a351f182cc
Implement autodiscovery for copy hosts and enhance interface detection in build-and-copy and launch-cluster scripts
2025-12-19 10:36:39 -08:00
Eugene Rakhmatulin
244ad758d2
Updated README
2025-12-19 09:56:24 -08:00
Eugene Rakhmatulin
074316de68
Merge pull request #2
2025-12-19 08:59:29 -08:00
Eugene Rakhmatulin
23858a3c7f
Merge branch 'main' into pr-2
2025-12-19 08:51:52 -08:00
Eugene Rakhmatulin
de055928b8
Update CHANGELOG: Document --nccl-debug option for NCCL debug level control
2025-12-18 23:29:03 -08:00
Eugene Rakhmatulin
294d155532
Add NCCL debug level option to launch-cluster.sh
2025-12-18 23:28:12 -08:00
Eugene Rakhmatulin
0377e9badf
Bugfix: don't shut down on exit if cluster is already running
2025-12-18 23:12:39 -08:00
Eugene Rakhmatulin
2a2f8f24e2
Allow launch-cluster.sh to be executed in non-TTY environment
2025-12-18 23:02:58 -08:00
Eugene Rakhmatulin
8c53179cc2
changed extra docker args variable to VLLM_SPARK_EXTRA_DOCKER_ARGS for consistency
2025-12-18 22:27:27 -08:00
Eugene Rakhmatulin
cf937af897
Merge pull request #6
2025-12-18 22:17:12 -08:00
Eugene Rakhmatulin
cf9da89545
Updated README
2025-12-18 22:03:46 -08:00
Eugene Rakhmatulin
8a0cb3c853
Merge branch 'main' into pr-6
2025-12-18 22:02:13 -08:00
Eugene Rakhmatulin
442f7369ad
Updated build script to handle BUILD_JOBS argument
2025-12-18 22:02:04 -08:00
Eugene Rakhmatulin
e6efd668cd
Added Table of Contents to README
2025-12-18 15:43:09 -08:00
Eugene Rakhmatulin
8be691e806
Fixed issue with argument passing
2025-12-18 15:31:53 -08:00
Eugene Rakhmatulin
369283f655
Updated README.md with launch-cluster details.
2025-12-18 15:25:22 -08:00
Eugene Rakhmatulin
db5c443905
Enhance launch-cluster script with improved node detection and SSH scanning using netcat and Python
2025-12-18 14:52:23 -08:00
Eugene Rakhmatulin
6c04ebfca1
Refactor launch-cluster script to include cluster running checks and streamline start process for head and worker nodes
2025-12-18 14:50:26 -08:00
Eugene Rakhmatulin
f7a15bfaf5
Enhance launch-cluster script with improved SSH connectivity checks for worker nodes
2025-12-18 14:22:48 -08:00
Eugene Rakhmatulin
25b1d8eb4f
Enhance launch-cluster script with auto-detection for interfaces and nodes
2025-12-18 13:53:28 -08:00
Eugene Rakhmatulin
a1ed352635
renamed launch-cluster for consitency
2025-12-18 13:11:48 -08:00
Eugene Rakhmatulin
20a6699bf7
Add launch_cluster script for managing cluster nodes and actions
2025-12-18 13:11:13 -08:00
Eugene Rakhmatulin
1025243316
Added launch_cluster script to simplify launching cluster on nodes.
2025-12-18 13:10:57 -08:00
Christopher Owen
a13a9f6806
Limit build parallelism to reduce OOM situations
2025-12-18 13:36:35 +01:00
Eric Lewis
11355677f6
Add parallel copy option to build-and-copy.sh
...
Introduced the --copy-parallel flag to enable concurrent copying of Docker images to multiple hosts. Updated the README with usage instructions and details about the new option. Refactored the script to support both serial and parallel copy modes for improved efficiency.
2025-12-18 01:24:48 -05:00
Eric Lewis
e67abd5e6e
Add multi-host copy support to build-and-copy.sh
...
Updated build-and-copy.sh to support copying Docker images to multiple hosts using the new -c/--copy-to flag, which accepts space- or comma-separated host lists. The old --copy-to-host flag is retained as an alias for backward compatibility, and -h is now used for help. The README was updated to document these changes and provide new usage examples.
2025-12-18 00:32:45 -05:00
Eugene Rakhmatulin
e0f6cff132
Merge pull request #1
2025-12-16 21:32:42 -08:00
TeskaLabs Admin
f1abfb85b6
Bump of the version
2025-12-16 17:58:48 +00:00
Eugene Rakhmatulin
79f6a204d1
Update README.md
2025-12-15 09:51:49 -08:00
Eugene Rakhmatulin
0606b1b984
Refactor Triton and vLLM reference handling in Dockerfile and build script
2025-12-14 23:28:08 -08:00
eugr
4551795908
Fixed missing Infiniband dependency, added CuDNN
2025-12-14 21:49:50 -08:00