Eugene Rakhmatulin
1e5aa060b8
Updated README to include networking guide
2026-02-03 14:14:05 -08:00
Raphael Amorim
30f16f1d4e
feat: Add recipe-based one-click model deployment system
...
Introduces a YAML recipe system for simplified model deployment:
- run-recipe.py: Main script handling build, download, and launch
- run-recipe.sh: Bash wrapper for dependency management
- recipes/: Pre-configured recipes for common models
- glm-4.7-flash-awq.yaml: GLM-4.7-Flash with AWQ quantization
- glm-4.7-nvfp4.yaml: GLM-4.7 with NVFP4 (cluster-only)
- minimax-m2-awq.yaml: MiniMax M2 with AWQ
- openai-gpt-oss-120b.yaml: OpenAI GPT-OSS 120B with MXFP4
Key features:
- Auto-discover cluster nodes with --discover, saves to .env
- Load nodes from .env automatically on subsequent runs
- cluster_only flag for models requiring multi-node setup
- build_args field for Dockerfile selection (--pre-tf, --exp-mxfp4)
- Solo mode auto-strips --distributed-executor-backend ray
- --setup flag for full build + download + run workflow
- --dry-run to preview execution without running
Usage:
./run-recipe.sh --discover # Find and save cluster nodes
./run-recipe.sh glm-4.7-flash-awq --solo --setup
./run-recipe.sh glm-4.7-nvfp4 --setup # Uses nodes from .env
2026-02-03 16:09:12 -05:00
Eugene Rakhmatulin
ecf7f5f7b5
Merge branch 'main' into pytorch-base
2026-02-03 12:55:03 -08:00
Eugene Rakhmatulin
f8eb294c58
Updated README.md and added Networking Guide.
2026-02-03 12:54:38 -08:00
Eugene Rakhmatulin
4b9ab0de7c
Added ability to launch NGC container in the cluster
2026-02-02 16:57:04 -08:00
Eugene Rakhmatulin
997bf9ea0e
Merge branch 'main' into pytorch-base
2026-02-02 12:44:15 -08:00
Eugene Rakhmatulin
4634ee92a2
Added a mod for Nemotron Nano
2026-02-02 11:58:07 -08:00
Eugene Rakhmatulin
37953478f0
changed arch codes again to be in line with upcoming PR
2026-02-02 09:21:48 -08:00
Raphael Amorim
751bc5a47a
Adding sample profile and profile loader
2026-02-02 10:25:53 -05:00
Eugene Rakhmatulin
3c7f91081d
changed arch flags
2026-02-01 16:37:01 -08:00
Eugene Rakhmatulin
5f7d480801
Reverted Triton removal to use system triton package
2026-01-31 23:23:59 -08:00
Eugene Rakhmatulin
133ed9cfb9
bumped up MXFP4 base image version
2026-01-31 16:17:58 -08:00
Eugene Rakhmatulin
c81edce091
bumped up MXFP4 base image version
2026-01-31 16:12:33 -08:00
Eugene Rakhmatulin
9691eed1b0
Disabled Triton build for now
2026-01-31 00:10:52 -08:00
Eugene Rakhmatulin
7c61b4057c
Added Triton compilation to custom build
2026-01-30 23:44:20 -08:00
Eugene Rakhmatulin
0482435848
Restore previous wheels build
2026-01-30 18:43:39 -08:00
Eugene Rakhmatulin
a6d6bafa69
Merge branch 'main' into pytorch-base
2026-01-30 17:06:29 -08:00
Eugene Rakhmatulin
4a4b4e7610
Fixed a bug when solo mode failed on a standalone Spark without configured RoCE.
2026-01-30 16:39:11 -08:00
Eugene Rakhmatulin
a4b524625a
using "from scratch" build for wheels to reduce image size
2026-01-30 16:29:47 -08:00
Eugene Rakhmatulin
518dc0108b
moved deps buster
2026-01-30 15:25:54 -08:00
Eugene Rakhmatulin
57c890b10c
Reduced MXFP4 container size
2026-01-30 15:18:42 -08:00
Eugene Rakhmatulin
008af21383
Merge branch 'main' into pytorch-base
2026-01-30 13:37:03 -08:00
Eugene Rakhmatulin
a13c7d3007
cosmetic changes
2026-01-30 13:26:57 -08:00
Eugene Rakhmatulin
7dd0642621
Reduced final image size
2026-01-30 13:16:55 -08:00
Eugene Rakhmatulin
be19675980
Fixed initial vllm source fetch if not using main branch
2026-01-30 11:24:51 -08:00
Eugene Rakhmatulin
3a68e1ca46
Fixed #25
2026-01-30 11:20:29 -08:00
Eugene Rakhmatulin
af6d5eae32
Temporarily removing incompatible triton-kernels
2026-01-30 11:17:38 -08:00
Eugene Rakhmatulin
7d232a305a
Reverted to Torch 2.9.1 in the source build to address #24
2026-01-30 10:43:12 -08:00
Eugene Rakhmatulin
34bd3ae39c
Fixed fetching vllm source code in MXFP4 version.
2026-01-30 09:07:01 -08:00
Eugene Rakhmatulin
458439706a
Build flashinfer from source
2026-01-30 09:05:22 -08:00
Eugene Rakhmatulin
ef0f996df6
Bumped base image version; reverted Triton to 3.5.1
2026-01-29 23:14:43 -08:00
Eugene Rakhmatulin
0ac438b4dd
Some optimizations
2026-01-29 22:08:05 -08:00
Eugene Rakhmatulin
a5b693cc1e
Merge branch 'main' into pytorch-base
2026-01-29 18:18:35 -08:00
Eugene Rakhmatulin
ace61c2d55
added new mod for glm4.7-flash-awq, solo model support.
2026-01-29 18:18:00 -08:00
Eugene Rakhmatulin
46fecd172a
added missing dependancy
2026-01-29 17:01:17 -08:00
Eugene Rakhmatulin
159460af0c
Migrated dockerfiles to pytorch-base image
2026-01-29 15:47:07 -08:00
Eugene Rakhmatulin
067bbbbb2d
Merge branch 'mxfp4'
2026-01-29 14:20:07 -08:00
Eugene Rakhmatulin
9a907caffc
mxfp4 dockerfile optimizations
2026-01-29 14:17:36 -08:00
Eugene Rakhmatulin
7a81e90cd2
added -e parameter
2026-01-29 13:06:22 -08:00
Eugene Rakhmatulin
53a8b45bcb
Added experimental MXFP4 optimizations
2026-01-29 11:56:17 -08:00
Eugene Rakhmatulin
b58ba7b19a
Added cubins and jit-cache
2026-01-29 11:42:04 -08:00
Eugene Rakhmatulin
36e3b7af27
Removed unnessesary dependencies
2026-01-29 09:58:44 -08:00
Eugene Rakhmatulin
e4b57633fe
moved everything to uv
2026-01-29 08:34:49 -08:00
Eugene Rakhmatulin
a3afb6f313
Merge branch 'main' into mxfp4
2026-01-28 13:25:26 -08:00
Eugene Rakhmatulin
74c02c37c2
warning message about wheel builds
2026-01-28 13:25:02 -08:00
Eugene Rakhmatulin
cef3727f26
Updated SHA for repos
2026-01-28 13:20:03 -08:00
Eugene Rakhmatulin
6b11902cc8
Updated README
2026-01-26 23:18:27 -08:00
Eugene Rakhmatulin
564afc1f6b
Working MXFP4 fork, updated build script
2026-01-26 22:31:46 -08:00
Eugene Rakhmatulin
90c8b30276
Merge branch 'main' into mxfp4
2026-01-26 16:17:58 -08:00
Eugene Rakhmatulin
e817f3dbec
Updated Triton version to 3.6.0
2026-01-26 14:24:58 -08:00