Commit Graph

12 Commits

Author SHA1 Message Date
Eugene Rakhmatulin
a351f182cc Implement autodiscovery for copy hosts and enhance interface detection in build-and-copy and launch-cluster scripts 2025-12-19 10:36:39 -08:00
Eugene Rakhmatulin
294d155532 Add NCCL debug level option to launch-cluster.sh 2025-12-18 23:28:12 -08:00
Eugene Rakhmatulin
0377e9badf Bugfix: don't shut down on exit if cluster is already running 2025-12-18 23:12:39 -08:00
Eugene Rakhmatulin
2a2f8f24e2 Allow launch-cluster.sh to be executed in non-TTY environment 2025-12-18 23:02:58 -08:00
Eugene Rakhmatulin
8c53179cc2 changed extra docker args variable to VLLM_SPARK_EXTRA_DOCKER_ARGS for consistency 2025-12-18 22:27:27 -08:00
Eugene Rakhmatulin
8be691e806 Fixed issue with argument passing 2025-12-18 15:31:53 -08:00
Eugene Rakhmatulin
369283f655 Updated README.md with launch-cluster details. 2025-12-18 15:25:22 -08:00
Eugene Rakhmatulin
db5c443905 Enhance launch-cluster script with improved node detection and SSH scanning using netcat and Python 2025-12-18 14:52:23 -08:00
Eugene Rakhmatulin
6c04ebfca1 Refactor launch-cluster script to include cluster running checks and streamline start process for head and worker nodes 2025-12-18 14:50:26 -08:00
Eugene Rakhmatulin
f7a15bfaf5 Enhance launch-cluster script with improved SSH connectivity checks for worker nodes 2025-12-18 14:22:48 -08:00
Eugene Rakhmatulin
25b1d8eb4f Enhance launch-cluster script with auto-detection for interfaces and nodes 2025-12-18 13:53:28 -08:00
Eugene Rakhmatulin
a1ed352635 renamed launch-cluster for consitency 2025-12-18 13:11:48 -08:00