Eugene Rakhmatulin
|
07fac71dac
|
Fixed bug with CONTAINER_NAME variable
|
2026-03-25 14:42:01 -07:00 |
|
Eugene Rakhmatulin
|
1702f47df6
|
Merge branch '3-node' of gitlab.home.eugr.net:ai/spark-vllm into 3-node
|
2026-03-25 14:18:32 -07:00 |
|
Eugene Rakhmatulin
|
ad2cd3373f
|
.env configuration support for launch-cluster.sh
|
2026-03-25 14:18:00 -07:00 |
|
Eugene Rakhmatulin
|
1fd8c7afc3
|
Merge branch 'main' into 3-node
|
2026-03-25 12:45:40 -07:00 |
|
Eugene Rakhmatulin
|
3dcd2a90c1
|
Updated Nemotron-3-Super recipe
|
2026-03-25 12:44:44 -07:00 |
|
Eugene Rakhmatulin
|
efacbd69f2
|
Updated Nemotron3-Super recipe
|
2026-03-25 12:43:12 -07:00 |
|
Eugene Rakhmatulin
|
c4b078b868
|
Merge branch 'main' into 3-node
|
2026-03-24 22:21:25 -07:00 |
|
Eugene Rakhmatulin
|
3be2fb24a8
|
Merge pull request #122
|
2026-03-24 22:18:52 -07:00 |
|
Eugene Rakhmatulin
|
7fa69187df
|
metadata changes
|
2026-03-24 22:18:07 -07:00 |
|
Drew Botwinick
|
8298c3d7f8
|
Merge remote-tracking branch 'upstream/main'
# Conflicts:
# Dockerfile
|
2026-03-24 15:41:09 -05:00 |
|
Eugene Rakhmatulin
|
f8c2653fd3
|
Quick fix for NCCL dependency
|
2026-03-23 23:20:59 -07:00 |
|
Eugene Rakhmatulin
|
990a7b3837
|
Use mesh-optimized NCCL
|
2026-03-23 15:43:18 -07:00 |
|
Eugene Rakhmatulin
|
9e089acf2b
|
Updated Nemotron recipes to use VLLM CUTLASS
|
2026-03-22 23:03:24 -07:00 |
|
Eugene Rakhmatulin
|
2d749742e4
|
Changed base image back to base CUDA development one
|
2026-03-21 18:11:20 -07:00 |
|
Eugene Rakhmatulin
|
7a54657abf
|
Revert "cuda 13.2 torch"
This reverts commit 926dd57a87.
|
2026-03-21 15:36:17 -07:00 |
|
Eugene Rakhmatulin
|
926dd57a87
|
cuda 13.2 torch
|
2026-03-21 15:15:01 -07:00 |
|
Eugene Rakhmatulin
|
6e8d85c914
|
cleanup
|
2026-03-21 15:12:12 -07:00 |
|
Drew Botwinick
|
d6e76f8e2f
|
add build metadata generation and include in Dockerfiles
|
2026-03-21 16:10:04 -05:00 |
|
Eugene Rakhmatulin
|
8385506c5e
|
Fixes
|
2026-03-20 23:51:21 -07:00 |
|
Eugene Rakhmatulin
|
8caebe3155
|
Reverting back to CUDA image + pytorch from wheels
|
2026-03-20 17:03:18 -07:00 |
|
Eugene Rakhmatulin
|
919a881cb1
|
Merge branch 'main' of gitlab.home.eugr.net:ai/spark-vllm
|
2026-03-18 22:03:25 -07:00 |
|
Eugene Rakhmatulin
|
8ddc259619
|
Fixed #111
|
2026-03-18 22:03:04 -07:00 |
|
eugr
|
22f3fa6c21
|
Merge pull request #103 from apairmont/network_arg
Add docker --network arg to common build flags
|
2026-03-18 21:48:48 -07:00 |
|
Eugene Rakhmatulin
|
15d295887c
|
Updated README to reflect --master-port parameter
|
2026-03-18 21:23:28 -07:00 |
|
Eugene Rakhmatulin
|
7e4150feed
|
Added master-port argument
|
2026-03-18 16:57:55 -07:00 |
|
eugr
|
7b752c31c5
|
Merge pull request #110 from voloszad/patch-1
Remove run-cluster-node.sh script copy and permission commands from Dockerfile.mxfp4
|
2026-03-18 14:54:11 -07:00 |
|
Andrej V.
|
bdd2b10f54
|
Remove script copy and permission commands from Dockerfile
Removed script copying and permission setting for run-cluster-node.sh.
|
2026-03-18 21:57:56 +01:00 |
|
Eugene Rakhmatulin
|
2755b62d12
|
Fixes #108
|
2026-03-18 13:26:39 -07:00 |
|
Eugene Rakhmatulin
|
f327b92abe
|
Fixes #106 and #108
|
2026-03-18 13:06:44 -07:00 |
|
Eugene Rakhmatulin
|
57b458570e
|
Added experimental Qwen3.5-397B support for dual Spark configuration
|
2026-03-17 19:05:36 -07:00 |
|
Eugene Rakhmatulin
|
57ed099465
|
Updated README file to reflect new launch-cluster options.
|
2026-03-17 16:16:04 -07:00 |
|
Eugene Rakhmatulin
|
fb0687cd1b
|
Updated README to describe no-ray mode
|
2026-03-17 15:27:22 -07:00 |
|
Eugene Rakhmatulin
|
ccea2ba861
|
Bugfixes
|
2026-03-17 13:54:42 -07:00 |
|
Eugene Rakhmatulin
|
957605498c
|
Added extra passthrough variables to run-recipe
|
2026-03-17 13:41:40 -07:00 |
|
Eugene Rakhmatulin
|
b1eeefc0eb
|
Changed Nemotron-3-Nano-NVFP4 to Marlin backend
|
2026-03-17 13:10:48 -07:00 |
|
Alan Pairmont
|
b879b7748f
|
add network arg to common build flags
|
2026-03-16 12:09:59 -04:00 |
|
Eugene Rakhmatulin
|
fa645f3e4b
|
bugfixes
|
2026-03-13 13:39:30 -07:00 |
|
Eugene Rakhmatulin
|
dedbd0a01d
|
bugfixes
|
2026-03-13 12:41:48 -07:00 |
|
Eugene Rakhmatulin
|
caa83d9e5b
|
Bugfixes
|
2026-03-13 12:32:43 -07:00 |
|
Eugene Rakhmatulin
|
4bcbbaa25a
|
Bugfixes
|
2026-03-13 12:23:41 -07:00 |
|
Eugene Rakhmatulin
|
d08266a123
|
Bugfixes
|
2026-03-13 12:18:22 -07:00 |
|
Eugene Rakhmatulin
|
03b055d7f0
|
Major cluster orchestration refactoring to support running without Ray
|
2026-03-13 11:55:18 -07:00 |
|
Eugene Rakhmatulin
|
d609fecef3
|
Merge branch 'main' of github.com:eugr/spark-vllm-docker
|
2026-03-12 15:04:41 -07:00 |
|
eugr
|
7c198b1ceb
|
Merge pull request #90 from sonusflow/pr/qwen35-397b-tp4
Add Qwen3.5-397B INT4-AutoRound TP=4 recipe (37 tok/s)
|
2026-03-12 15:04:23 -07:00 |
|
Eugene Rakhmatulin
|
8ae51192e5
|
Experimental mod to support gpu-memory-utilization-gb
|
2026-03-12 13:37:44 -07:00 |
|
Eugene Rakhmatulin
|
8fec9bed06
|
Updated Nemotron to support dual sparks
|
2026-03-12 13:30:15 -07:00 |
|
Eugene Rakhmatulin
|
6a323cc6f5
|
Merge pull request #93
|
2026-03-12 13:00:13 -07:00 |
|
Eugene Rakhmatulin
|
6f9a2f981c
|
Adjusted model parameters
|
2026-03-12 12:59:05 -07:00 |
|
remi
|
122edc8229
|
super nemotron mod & recipe for nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
|
2026-03-11 20:53:44 +01:00 |
|
Eugene Rakhmatulin
|
7ceea85647
|
Fixed qwen3-coder-next-int4-autoround to exclude Ray
|
2026-03-11 11:20:56 -07:00 |
|