Commit Graph

362 Commits

Author SHA1 Message Date
Eugene Rakhmatulin
8b7c02aa25 add .env support to build-and-copy.sh 2026-03-25 22:47:02 -07:00
Eugene Rakhmatulin
73fec1bdf8 bugfix 2026-03-25 15:40:09 -07:00
Eugene Rakhmatulin
2f5ff0211e Cleanup in build script 2026-03-25 15:39:23 -07:00
Eugene Rakhmatulin
63ee72e729 Merge branch '3-node' of gitlab.home.eugr.net:ai/spark-vllm into 3-node 2026-03-25 15:36:31 -07:00
Eugene Rakhmatulin
4a0feea6c3 Added --cleanup option to build script 2026-03-25 15:35:32 -07:00
Eugene Rakhmatulin
429042b7dc Revert "Added --cleanup option"
This reverts commit b8930b05a1.
2026-03-25 15:35:15 -07:00
Eugene Rakhmatulin
ef95336937 Merge branch '3-node' of gitlab.home.eugr.net:ai/spark-vllm into 3-node 2026-03-25 15:25:19 -07:00
Eugene Rakhmatulin
b8930b05a1 Added --cleanup option 2026-03-25 15:24:59 -07:00
Eugene Rakhmatulin
49d505ad14 Merge branch '3-node' of gitlab.home.eugr.net:ai/spark-vllm into 3-node 2026-03-25 15:16:47 -07:00
Eugene Rakhmatulin
1755dfd114 Added LOCAL_IP support 2026-03-25 15:16:06 -07:00
Eugene Rakhmatulin
3d4dc4c82e Merge branch '3-node' of gitlab.home.eugr.net:ai/spark-vllm into 3-node 2026-03-25 14:42:37 -07:00
Eugene Rakhmatulin
07fac71dac Fixed bug with CONTAINER_NAME variable 2026-03-25 14:42:01 -07:00
Eugene Rakhmatulin
1702f47df6 Merge branch '3-node' of gitlab.home.eugr.net:ai/spark-vllm into 3-node 2026-03-25 14:18:32 -07:00
Eugene Rakhmatulin
ad2cd3373f .env configuration support for launch-cluster.sh 2026-03-25 14:18:00 -07:00
Eugene Rakhmatulin
1fd8c7afc3 Merge branch 'main' into 3-node 2026-03-25 12:45:40 -07:00
Eugene Rakhmatulin
3dcd2a90c1 Updated Nemotron-3-Super recipe 2026-03-25 12:44:44 -07:00
Eugene Rakhmatulin
efacbd69f2 Updated Nemotron3-Super recipe 2026-03-25 12:43:12 -07:00
Eugene Rakhmatulin
c4b078b868 Merge branch 'main' into 3-node 2026-03-24 22:21:25 -07:00
Eugene Rakhmatulin
3be2fb24a8 Merge pull request #122 2026-03-24 22:18:52 -07:00
Eugene Rakhmatulin
7fa69187df metadata changes 2026-03-24 22:18:07 -07:00
Drew Botwinick
8298c3d7f8 Merge remote-tracking branch 'upstream/main'
# Conflicts:
#	Dockerfile
2026-03-24 15:41:09 -05:00
Eugene Rakhmatulin
f8c2653fd3 Quick fix for NCCL dependency 2026-03-23 23:20:59 -07:00
Eugene Rakhmatulin
990a7b3837 Use mesh-optimized NCCL 2026-03-23 15:43:18 -07:00
Eugene Rakhmatulin
9e089acf2b Updated Nemotron recipes to use VLLM CUTLASS 2026-03-22 23:03:24 -07:00
Eugene Rakhmatulin
2d749742e4 Changed base image back to base CUDA development one 2026-03-21 18:11:20 -07:00
Eugene Rakhmatulin
7a54657abf Revert "cuda 13.2 torch"
This reverts commit 926dd57a87.
2026-03-21 15:36:17 -07:00
Eugene Rakhmatulin
926dd57a87 cuda 13.2 torch 2026-03-21 15:15:01 -07:00
Eugene Rakhmatulin
6e8d85c914 cleanup 2026-03-21 15:12:12 -07:00
Drew Botwinick
d6e76f8e2f add build metadata generation and include in Dockerfiles 2026-03-21 16:10:04 -05:00
Eugene Rakhmatulin
8385506c5e Fixes 2026-03-20 23:51:21 -07:00
Eugene Rakhmatulin
8caebe3155 Reverting back to CUDA image + pytorch from wheels 2026-03-20 17:03:18 -07:00
Eugene Rakhmatulin
919a881cb1 Merge branch 'main' of gitlab.home.eugr.net:ai/spark-vllm 2026-03-18 22:03:25 -07:00
Eugene Rakhmatulin
8ddc259619 Fixed #111 2026-03-18 22:03:04 -07:00
eugr
22f3fa6c21 Merge pull request #103 from apairmont/network_arg
Add docker --network arg to common build flags
2026-03-18 21:48:48 -07:00
Eugene Rakhmatulin
15d295887c Updated README to reflect --master-port parameter 2026-03-18 21:23:28 -07:00
Eugene Rakhmatulin
7e4150feed Added master-port argument 2026-03-18 16:57:55 -07:00
eugr
7b752c31c5 Merge pull request #110 from voloszad/patch-1
Remove run-cluster-node.sh script copy and permission commands from Dockerfile.mxfp4
2026-03-18 14:54:11 -07:00
Andrej V.
bdd2b10f54 Remove script copy and permission commands from Dockerfile
Removed script copying and permission setting for run-cluster-node.sh.
2026-03-18 21:57:56 +01:00
Eugene Rakhmatulin
2755b62d12 Fixes #108 2026-03-18 13:26:39 -07:00
Eugene Rakhmatulin
f327b92abe Fixes #106 and #108 2026-03-18 13:06:44 -07:00
Eugene Rakhmatulin
57b458570e Added experimental Qwen3.5-397B support for dual Spark configuration 2026-03-17 19:05:36 -07:00
Eugene Rakhmatulin
57ed099465 Updated README file to reflect new launch-cluster options. 2026-03-17 16:16:04 -07:00
Eugene Rakhmatulin
fb0687cd1b Updated README to describe no-ray mode 2026-03-17 15:27:22 -07:00
Eugene Rakhmatulin
ccea2ba861 Bugfixes 2026-03-17 13:54:42 -07:00
Eugene Rakhmatulin
957605498c Added extra passthrough variables to run-recipe 2026-03-17 13:41:40 -07:00
Eugene Rakhmatulin
b1eeefc0eb Changed Nemotron-3-Nano-NVFP4 to Marlin backend 2026-03-17 13:10:48 -07:00
Alan Pairmont
b879b7748f add network arg to common build flags 2026-03-16 12:09:59 -04:00
Eugene Rakhmatulin
fa645f3e4b bugfixes 2026-03-13 13:39:30 -07:00
Eugene Rakhmatulin
dedbd0a01d bugfixes 2026-03-13 12:41:48 -07:00
Eugene Rakhmatulin
caa83d9e5b Bugfixes 2026-03-13 12:32:43 -07:00