Commit Graph

74 Commits

Author SHA1 Message Date
Eugene Rakhmatulin
15d295887c Updated README to reflect --master-port parameter 2026-03-18 21:23:28 -07:00
Eugene Rakhmatulin
57b458570e Added experimental Qwen3.5-397B support for dual Spark configuration 2026-03-17 19:05:36 -07:00
Eugene Rakhmatulin
57ed099465 Updated README file to reflect new launch-cluster options. 2026-03-17 16:16:04 -07:00
Eugene Rakhmatulin
fb0687cd1b Updated README to describe no-ray mode 2026-03-17 15:27:22 -07:00
Eugene Rakhmatulin
8fec9bed06 Updated Nemotron to support dual sparks 2026-03-12 13:30:15 -07:00
Eugene Rakhmatulin
45066e2b16 Updated README 2026-03-11 09:57:34 -07:00
Eugene Rakhmatulin
505a060a7d vLLM prebuilt wheels support 2026-03-04 16:01:50 -08:00
Eugene Rakhmatulin
7d8465fd9c Added recipe for qwen3.5-122b-int4-autoround, updated README 2026-03-02 12:18:16 -08:00
Eugene Rakhmatulin
f09c2c3ac8 Refactoring, updated README 2026-02-18 15:58:53 -08:00
Eugene Rakhmatulin
bd3f45f920 Updated MXFP4 build to use fresh repo references 2026-02-18 13:35:09 -08:00
Eugene Rakhmatulin
5b2313dddb Changed KV type to fp8 in qwen3-coder-next recipe and reduced default context size to 131072 to ensure it all fits in a single Spark. 2026-02-17 13:07:54 -08:00
Eugene Rakhmatulin
f886505436 Added --non-privileged flag to launch-cluster.sh 2026-02-15 00:12:06 -08:00
Eugene Rakhmatulin
701147b1eb Qwen3-Coder-Next fixes and updated recipe 2026-02-12 15:56:32 -08:00
Eugene Rakhmatulin
3b1e49dcb0 Supporting other CUDA archs via --gpu-arch flag 2026-02-11 13:10:41 -08:00
Eugene Rakhmatulin
6d3f5dfd5c map flashinfer/torch/triton cache directories by default 2026-02-10 16:36:02 -08:00
Eugene Rakhmatulin
ace16f3a8f Applied new fastsafetensors fix to mxfp4 build; disabled wheel builds by default 2026-02-09 23:47:06 -08:00
Eugene Rakhmatulin
2923fe6ea5 Removed temp fastsafetensors patch 2026-02-09 10:21:14 -08:00
Eugene Rakhmatulin
ec987259a0 Recipes and Launch Script support 2026-02-04 12:01:53 -08:00
Eugene Rakhmatulin
ef6a5eca29 Merge branch 'main' into pr-19 2026-02-04 11:36:59 -08:00
Eugene Rakhmatulin
1e5aa060b8 Updated README to include networking guide 2026-02-03 14:14:05 -08:00
Eugene Rakhmatulin
f8eb294c58 Updated README.md and added Networking Guide. 2026-02-03 12:54:38 -08:00
Eugene Rakhmatulin
4b9ab0de7c Added ability to launch NGC container in the cluster 2026-02-02 16:57:04 -08:00
Eugene Rakhmatulin
4634ee92a2 Added a mod for Nemotron Nano 2026-02-02 11:58:07 -08:00
Raphael Amorim
751bc5a47a Adding sample profile and profile loader 2026-02-02 10:25:53 -05:00
Eugene Rakhmatulin
ace61c2d55 added new mod for glm4.7-flash-awq, solo model support. 2026-01-29 18:18:00 -08:00
Eugene Rakhmatulin
7a81e90cd2 added -e parameter 2026-01-29 13:06:22 -08:00
Eugene Rakhmatulin
a3afb6f313 Merge branch 'main' into mxfp4 2026-01-28 13:25:26 -08:00
Eugene Rakhmatulin
74c02c37c2 warning message about wheel builds 2026-01-28 13:25:02 -08:00
Eugene Rakhmatulin
6b11902cc8 Updated README 2026-01-26 23:18:27 -08:00
Eugene Rakhmatulin
18a25c8382 Updated README 2026-01-08 14:38:12 -08:00
Eugene Rakhmatulin
4ee090f632 Updated README re: hf-download option 2025-12-24 08:37:33 -08:00
Eugene Rakhmatulin
04e6d27b84 Updated README re: mods functionality 2025-12-23 18:09:59 -08:00
Eugene Rakhmatulin
786a50c5c7 Updated README 2025-12-21 22:41:48 -08:00
Eugene Rakhmatulin
1139a37324 Added transformers v5 support 2025-12-21 22:41:03 -08:00
Eugene Rakhmatulin
c37053adf6 Updated README 2025-12-21 14:57:35 -08:00
Eugene Rakhmatulin
82802f0cad Added Quickstart section to README 2025-12-21 14:53:05 -08:00
Eugene Rakhmatulin
bbd3469549 Support vLLM release wheels 2025-12-21 11:15:52 -08:00
Eugene Rakhmatulin
2aa545a810 Added PSA about build cache 2025-12-21 00:49:59 -08:00
Eugene Rakhmatulin
63a1a6a97c Update README to reflect reduced build time and container size for vLLM 2025-12-20 23:16:12 -08:00
Eugene Rakhmatulin
dfe426e912 Add support for pre-release FlashInfer packages in Docker builds 2025-12-20 23:13:26 -08:00
Eugene Rakhmatulin
76988e0c75 Added --use-wheels to use precompiled vLLM wheels instead of compiling from the source 2025-12-20 20:25:07 -08:00
Eugene Rakhmatulin
0cac77c286 Fixed contributor username 2025-12-19 10:41:03 -08:00
Eugene Rakhmatulin
3eb57a6d49 Updated README - autodiscovery in copy ops 2025-12-19 10:39:28 -08:00
Eugene Rakhmatulin
244ad758d2 Updated README 2025-12-19 09:56:24 -08:00
Eugene Rakhmatulin
23858a3c7f Merge branch 'main' into pr-2 2025-12-19 08:51:52 -08:00
Eugene Rakhmatulin
de055928b8 Update CHANGELOG: Document --nccl-debug option for NCCL debug level control 2025-12-18 23:29:03 -08:00
Eugene Rakhmatulin
294d155532 Add NCCL debug level option to launch-cluster.sh 2025-12-18 23:28:12 -08:00
Eugene Rakhmatulin
8c53179cc2 changed extra docker args variable to VLLM_SPARK_EXTRA_DOCKER_ARGS for consistency 2025-12-18 22:27:27 -08:00
Eugene Rakhmatulin
cf9da89545 Updated README 2025-12-18 22:03:46 -08:00
Eugene Rakhmatulin
e6efd668cd Added Table of Contents to README 2025-12-18 15:43:09 -08:00