Raphael Amorim
28ba6090fc
Adding suggestions from Eugr and unit tests
2026-02-03 17:32:59 -05:00
Raphael Amorim
30f16f1d4e
feat: Add recipe-based one-click model deployment system
...
Introduces a YAML recipe system for simplified model deployment:
- run-recipe.py: Main script handling build, download, and launch
- run-recipe.sh: Bash wrapper for dependency management
- recipes/: Pre-configured recipes for common models
- glm-4.7-flash-awq.yaml: GLM-4.7-Flash with AWQ quantization
- glm-4.7-nvfp4.yaml: GLM-4.7 with NVFP4 (cluster-only)
- minimax-m2-awq.yaml: MiniMax M2 with AWQ
- openai-gpt-oss-120b.yaml: OpenAI GPT-OSS 120B with MXFP4
Key features:
- Auto-discover cluster nodes with --discover, saves to .env
- Load nodes from .env automatically on subsequent runs
- cluster_only flag for models requiring multi-node setup
- build_args field for Dockerfile selection (--pre-tf, --exp-mxfp4)
- Solo mode auto-strips --distributed-executor-backend ray
- --setup flag for full build + download + run workflow
- --dry-run to preview execution without running
Usage:
./run-recipe.sh --discover # Find and save cluster nodes
./run-recipe.sh glm-4.7-flash-awq --solo --setup
./run-recipe.sh glm-4.7-nvfp4 --setup # Uses nodes from .env
2026-02-03 16:09:12 -05:00
Raphael Amorim
751bc5a47a
Adding sample profile and profile loader
2026-02-02 10:25:53 -05:00
Eugene Rakhmatulin
133ed9cfb9
bumped up MXFP4 base image version
2026-01-31 16:17:58 -08:00
Eugene Rakhmatulin
4a4b4e7610
Fixed a bug when solo mode failed on a standalone Spark without configured RoCE.
2026-01-30 16:39:11 -08:00
Eugene Rakhmatulin
57c890b10c
Reduced MXFP4 container size
2026-01-30 15:18:42 -08:00
Eugene Rakhmatulin
be19675980
Fixed initial vllm source fetch if not using main branch
2026-01-30 11:24:51 -08:00
Eugene Rakhmatulin
3a68e1ca46
Fixed #25
2026-01-30 11:20:29 -08:00
Eugene Rakhmatulin
7d232a305a
Reverted to Torch 2.9.1 in the source build to address #24
2026-01-30 10:43:12 -08:00
Eugene Rakhmatulin
34bd3ae39c
Fixed fetching vllm source code in MXFP4 version.
2026-01-30 09:07:01 -08:00
Eugene Rakhmatulin
ef0f996df6
Bumped base image version; reverted Triton to 3.5.1
2026-01-29 23:14:43 -08:00
Eugene Rakhmatulin
ace61c2d55
added new mod for glm4.7-flash-awq, solo model support.
2026-01-29 18:18:00 -08:00
Eugene Rakhmatulin
067bbbbb2d
Merge branch 'mxfp4'
2026-01-29 14:20:07 -08:00
Eugene Rakhmatulin
9a907caffc
mxfp4 dockerfile optimizations
2026-01-29 14:17:36 -08:00
Eugene Rakhmatulin
7a81e90cd2
added -e parameter
2026-01-29 13:06:22 -08:00
Eugene Rakhmatulin
53a8b45bcb
Added experimental MXFP4 optimizations
2026-01-29 11:56:17 -08:00
Eugene Rakhmatulin
b58ba7b19a
Added cubins and jit-cache
2026-01-29 11:42:04 -08:00
Eugene Rakhmatulin
36e3b7af27
Removed unnessesary dependencies
2026-01-29 09:58:44 -08:00
Eugene Rakhmatulin
e4b57633fe
moved everything to uv
2026-01-29 08:34:49 -08:00
Eugene Rakhmatulin
a3afb6f313
Merge branch 'main' into mxfp4
2026-01-28 13:25:26 -08:00
Eugene Rakhmatulin
74c02c37c2
warning message about wheel builds
2026-01-28 13:25:02 -08:00
Eugene Rakhmatulin
cef3727f26
Updated SHA for repos
2026-01-28 13:20:03 -08:00
Eugene Rakhmatulin
6b11902cc8
Updated README
2026-01-26 23:18:27 -08:00
Eugene Rakhmatulin
564afc1f6b
Working MXFP4 fork, updated build script
2026-01-26 22:31:46 -08:00
Eugene Rakhmatulin
90c8b30276
Merge branch 'main' into mxfp4
2026-01-26 16:17:58 -08:00
Eugene Rakhmatulin
e817f3dbec
Updated Triton version to 3.6.0
2026-01-26 14:24:58 -08:00
Eugene Rakhmatulin
aece2fad78
Initial import of MXFP4 branch
2026-01-24 22:40:36 -08:00
Eugene Rakhmatulin
25a16ef6c2
Fixed #11 and #12 - added a new dependency for OpenCV
2026-01-19 12:07:15 -08:00
Eugene Rakhmatulin
cd7678fe9f
Added MIT license
2026-01-13 19:38:24 +00:00
Eugene Rakhmatulin
18a25c8382
Updated README
2026-01-08 14:38:12 -08:00
Eugene Rakhmatulin
4ee090f632
Updated README re: hf-download option
2025-12-24 08:37:33 -08:00
Eugene Rakhmatulin
2a568481f0
Model download support
2025-12-24 00:30:15 -08:00
Eugene Rakhmatulin
04e6d27b84
Updated README re: mods functionality
2025-12-23 18:09:59 -08:00
Eugene Rakhmatulin
9ad61078ce
Added multiple mods support
2025-12-23 17:45:55 -08:00
Eugene Rakhmatulin
c90a6d0bde
Fixed remote docker execution
2025-12-23 13:49:38 -08:00
Eugene Rakhmatulin
19dec79c5c
initial mod implementation
2025-12-23 13:38:10 -08:00
Eugene Rakhmatulin
a9b1bb5947
fixed a bug with numpy version in wheels build when transformers 5 is used.
2025-12-21 22:53:31 -08:00
Eugene Rakhmatulin
1464b0dc8f
Display image name in launch-cluster.sh output
2025-12-21 22:44:01 -08:00
Eugene Rakhmatulin
786a50c5c7
Updated README
2025-12-21 22:41:48 -08:00
Eugene Rakhmatulin
1139a37324
Added transformers v5 support
2025-12-21 22:41:03 -08:00
Eugene Rakhmatulin
c37053adf6
Updated README
2025-12-21 14:57:35 -08:00
Eugene Rakhmatulin
82802f0cad
Added Quickstart section to README
2025-12-21 14:53:05 -08:00
Eugene Rakhmatulin
11db634aad
Switch to uv in the main Dockerfile
2025-12-21 13:28:40 -08:00
Eugene Rakhmatulin
bbd3469549
Support vLLM release wheels
2025-12-21 11:15:52 -08:00
Eugene Rakhmatulin
2aa545a810
Added PSA about build cache
2025-12-21 00:49:59 -08:00
Eugene Rakhmatulin
63a1a6a97c
Update README to reflect reduced build time and container size for vLLM
2025-12-20 23:16:12 -08:00
Eugene Rakhmatulin
dfe426e912
Add support for pre-release FlashInfer packages in Docker builds
2025-12-20 23:13:26 -08:00
Eugene Rakhmatulin
1b3968fe98
Merge branch 'flashinfer-0.6.0-pre'
2025-12-20 23:02:58 -08:00
Eugene Rakhmatulin
9f35dbdd2d
Reverted back to release flashinfer
2025-12-20 23:01:49 -08:00
Eugene Rakhmatulin
d5d85aaac7
Added optional flashinfer packages, using pre-release flashinfer
2025-12-20 22:56:40 -08:00