85 Commits

Author SHA1 Message Date
Eugene Rakhmatulin
dfe426e912 Add support for pre-release FlashInfer packages in Docker builds 2025-12-20 23:13:26 -08:00
Eugene Rakhmatulin
76988e0c75 Added --use-wheels to use precompiled vLLM wheels instead of compiling from the source 2025-12-20 20:25:07 -08:00
Eugene Rakhmatulin
0cac77c286 Fixed contributor username 2025-12-19 10:41:03 -08:00
Eugene Rakhmatulin
3eb57a6d49 Updated README - autodiscovery in copy ops 2025-12-19 10:39:28 -08:00
Eugene Rakhmatulin
244ad758d2 Updated README 2025-12-19 09:56:24 -08:00
Eugene Rakhmatulin
23858a3c7f Merge branch 'main' into pr-2 2025-12-19 08:51:52 -08:00
Eugene Rakhmatulin
de055928b8 Update CHANGELOG: Document --nccl-debug option for NCCL debug level control 2025-12-18 23:29:03 -08:00
Eugene Rakhmatulin
294d155532 Add NCCL debug level option to launch-cluster.sh 2025-12-18 23:28:12 -08:00
Eugene Rakhmatulin
8c53179cc2 changed extra docker args variable to VLLM_SPARK_EXTRA_DOCKER_ARGS for consistency 2025-12-18 22:27:27 -08:00
Eugene Rakhmatulin
cf9da89545 Updated README 2025-12-18 22:03:46 -08:00
Eugene Rakhmatulin
e6efd668cd Added Table of Contents to README 2025-12-18 15:43:09 -08:00
Eugene Rakhmatulin
8be691e806 Fixed issue with argument passing 2025-12-18 15:31:53 -08:00
Eugene Rakhmatulin
369283f655 Updated README.md with launch-cluster details. 2025-12-18 15:25:22 -08:00
Eric Lewis
11355677f6 Add parallel copy option to build-and-copy.sh
Introduced the --copy-parallel flag to enable concurrent copying of Docker images to multiple hosts. Updated the README with usage instructions and details about the new option. Refactored the script to support both serial and parallel copy modes for improved efficiency.
2025-12-18 01:24:48 -05:00
Eric Lewis
e67abd5e6e Add multi-host copy support to build-and-copy.sh
Updated build-and-copy.sh to support copying Docker images to multiple hosts using the new -c/--copy-to flag, which accepts space- or comma-separated host lists. The old --copy-to-host flag is retained as an alias for backward compatibility, and -h is now used for help. The README was updated to document these changes and provide new usage examples.
2025-12-18 00:32:45 -05:00
Eugene Rakhmatulin
79f6a204d1 Update README.md 2025-12-15 09:51:49 -08:00
eugr
02f842e1fd Updated README 2025-12-14 00:39:15 -08:00
eugr
295e1f2266 Removed MiniMax M2 temporary patch from Dockerfile; updated README.md 2025-12-11 13:24:57 -08:00
eugr
5fba205db4 Implemented a temporary patch for recently broken MiniMax-M2 (in builds after 12/10) for some quants. 2025-12-11 11:13:05 -08:00
eugr
9d351cd6d5 Updated README 2025-12-05 11:32:02 -08:00
eugr
270446be27 Add build-and-copy script for automated image building and deployment 2025-12-05 11:28:43 -08:00
eugr
6a66a4b66f Added patch to allow fastsafetensors in cluster config 2025-11-26 21:25:04 -08:00
eugr
712637a348 Added second RoCE interface to examples 2025-11-26 19:53:37 -08:00
eugr
bdf16a0a34 Formatting 2025-11-26 14:02:15 -08:00
eugr
cf8e411ad2 Added benchmarking 2025-11-26 14:01:04 -08:00
eugr
676fa2ace9 Formatting fix 2025-11-26 13:52:30 -08:00
eugr
4f27899939 Added some details on networking 2025-11-26 13:50:39 -08:00
eugr
1a4bc1d7aa Typo 2025-11-26 13:44:34 -08:00
eugr
2a7d31ad81 Updated README 2025-11-26 13:30:17 -08:00
eugr
a93bd56389 Updated README 2025-11-24 21:44:01 -08:00
eugr
bd48032c45 Fixed typo in docker command in README 2025-11-24 16:34:19 -08:00
eugr
2cfa1db2cf Updated README 2025-11-24 16:32:47 -08:00
eugr
6d6e4dfe50 Updated README 2025-11-24 16:23:00 -08:00
eugr
5c8feb086c Updated README 2025-11-24 15:32:28 -08:00
eugr
3ecca4d2b7 Updated Dockerfile to include 2 levels of cache busters, added the cluster script and README. 2025-11-24 15:21:08 -08:00