Commit Graph

177 Commits

Author SHA1 Message Date
Eugene Rakhmatulin
da4185cb12 Fixed an issue with fetching latest vLLM code 2026-02-11 22:35:49 -08:00
Eugene Rakhmatulin
3b1e49dcb0 Supporting other CUDA archs via --gpu-arch flag 2026-02-11 13:10:41 -08:00
Eugene Rakhmatulin
c6b245cfe8 Added prefix caching to nemotron recipe 2026-02-10 18:25:01 -08:00
Eugene Rakhmatulin
6d3f5dfd5c map flashinfer/torch/triton cache directories by default 2026-02-10 16:36:02 -08:00
Eugene Rakhmatulin
b990a1b8ac Fixed #37 2026-02-10 14:31:43 -08:00
Eugene Rakhmatulin
ace16f3a8f Applied new fastsafetensors fix to mxfp4 build; disabled wheel builds by default 2026-02-09 23:47:06 -08:00
Eugene Rakhmatulin
74876dd442 Added recipes for nemotron-nano-3 and qwen3-coder-next 2026-02-09 14:33:35 -08:00
Eugene Rakhmatulin
3aa5e5dce4 Merge pull request #34 2026-02-09 14:28:30 -08:00
Raphael Amorim
6943a51ced Adding tests and refactoring repeated methods 2026-02-09 17:21:32 -05:00
Raphael Amorim
d07ad5450f Adding solo_only option to the recipe 2026-02-09 17:03:57 -05:00
Eugene Rakhmatulin
2923fe6ea5 Removed temp fastsafetensors patch 2026-02-09 10:21:14 -08:00
Eugene Rakhmatulin
06e8817f18 Triton 3.6.0 is now default 2026-02-08 22:38:31 -08:00
Eugene Rakhmatulin
d845cd0401 changed arch to 12.1a again 2026-02-08 14:18:12 -08:00
Eugene Rakhmatulin
5bf422a2ca Merge branch 'main' into pytorch-base 2026-02-08 13:01:17 -08:00
Eugene Rakhmatulin
15c1506d0c Merge pull request #32 2026-02-08 07:17:20 -08:00
Raphael Amorim
b7c3cdcfcb Enhancement: add -- pass-through for arbitrary vLLM arguments
Implements Unix-style pass-through allowing any vLLM argument to be
passed after `--` separator. Arguments are appended verbatim to the
generated vLLM command.

Examples:
  ./run-recipe.py model --solo -- --load-format safetensors
  ./run-recipe.py model --solo -- --served-model-name my-api
  ./run-recipe.py model --solo -- -cc.cudagraph_mode=PIECEWISE

Features:
- Uses parse_known_args() to capture arguments after --
- Warns when extra args duplicate CLI overrides (--port, --tp, etc.)
- Works in both solo and cluster modes

Adds 10 integration tests covering:
- --load-format, --served-model-name, equals syntax
- Multiple arguments, empty --, cluster mode
- Duplicate detection warnings for port/tp/gpu-mem

Closes #30
2026-02-08 02:36:49 -05:00
Eugene Rakhmatulin
dfb300e51a Merge branch 'main' into pytorch-base 2026-02-05 13:54:12 -08:00
Eugene Rakhmatulin
8cb956b972 Updated networking guide 2026-02-05 13:53:57 -08:00
Eugene Rakhmatulin
66210e641d Merge branch 'main' into pytorch-base 2026-02-04 12:07:06 -08:00
Eugene Rakhmatulin
f139c4b55d Updated tests 2026-02-04 12:06:30 -08:00
Eugene Rakhmatulin
c7d45157e0 Merge pull request #19 2026-02-04 12:03:20 -08:00
Eugene Rakhmatulin
ec987259a0 Recipes and Launch Script support 2026-02-04 12:01:53 -08:00
Eugene Rakhmatulin
ef6a5eca29 Merge branch 'main' into pr-19 2026-02-04 11:36:59 -08:00
Eugene Rakhmatulin
f7830636af Cleaning up launch-cluster changes 2026-02-04 11:36:55 -08:00
Raphael Amorim
b1516f688a fix: Allow PR tests from any branch and add manual trigger 2026-02-03 17:42:09 -05:00
Raphael Amorim
28ba6090fc Adding suggestions from Eugr and unit tests 2026-02-03 17:32:59 -05:00
Eugene Rakhmatulin
d8e183cc9b Merge branch 'apply-pr' into pytorch-base 2026-02-03 14:17:46 -08:00
Eugene Rakhmatulin
c42cc56d34 bugfix 2026-02-03 14:17:30 -08:00
Eugene Rakhmatulin
79e646e833 Merge branch 'apply-pr' into pytorch-base 2026-02-03 14:14:45 -08:00
Eugene Rakhmatulin
d7e9f17c2e vLLM build-time PRs support 2026-02-03 14:14:11 -08:00
Eugene Rakhmatulin
1e5aa060b8 Updated README to include networking guide 2026-02-03 14:14:05 -08:00
Raphael Amorim
30f16f1d4e feat: Add recipe-based one-click model deployment system
Introduces a YAML recipe system for simplified model deployment:

- run-recipe.py: Main script handling build, download, and launch
- run-recipe.sh: Bash wrapper for dependency management
- recipes/: Pre-configured recipes for common models
  - glm-4.7-flash-awq.yaml: GLM-4.7-Flash with AWQ quantization
  - glm-4.7-nvfp4.yaml: GLM-4.7 with NVFP4 (cluster-only)
  - minimax-m2-awq.yaml: MiniMax M2 with AWQ
  - openai-gpt-oss-120b.yaml: OpenAI GPT-OSS 120B with MXFP4

Key features:
- Auto-discover cluster nodes with --discover, saves to .env
- Load nodes from .env automatically on subsequent runs
- cluster_only flag for models requiring multi-node setup
- build_args field for Dockerfile selection (--pre-tf, --exp-mxfp4)
- Solo mode auto-strips --distributed-executor-backend ray
- --setup flag for full build + download + run workflow
- --dry-run to preview execution without running

Usage:
  ./run-recipe.sh --discover           # Find and save cluster nodes
  ./run-recipe.sh glm-4.7-flash-awq --solo --setup
  ./run-recipe.sh glm-4.7-nvfp4 --setup  # Uses nodes from .env
2026-02-03 16:09:12 -05:00
Eugene Rakhmatulin
ecf7f5f7b5 Merge branch 'main' into pytorch-base 2026-02-03 12:55:03 -08:00
Eugene Rakhmatulin
f8eb294c58 Updated README.md and added Networking Guide. 2026-02-03 12:54:38 -08:00
Eugene Rakhmatulin
4b9ab0de7c Added ability to launch NGC container in the cluster 2026-02-02 16:57:04 -08:00
Eugene Rakhmatulin
997bf9ea0e Merge branch 'main' into pytorch-base 2026-02-02 12:44:15 -08:00
Eugene Rakhmatulin
4634ee92a2 Added a mod for Nemotron Nano 2026-02-02 11:58:07 -08:00
Eugene Rakhmatulin
37953478f0 changed arch codes again to be in line with upcoming PR 2026-02-02 09:21:48 -08:00
Raphael Amorim
751bc5a47a Adding sample profile and profile loader 2026-02-02 10:25:53 -05:00
Eugene Rakhmatulin
3c7f91081d changed arch flags 2026-02-01 16:37:01 -08:00
Eugene Rakhmatulin
5f7d480801 Reverted Triton removal to use system triton package 2026-01-31 23:23:59 -08:00
Eugene Rakhmatulin
133ed9cfb9 bumped up MXFP4 base image version 2026-01-31 16:17:58 -08:00
Eugene Rakhmatulin
c81edce091 bumped up MXFP4 base image version 2026-01-31 16:12:33 -08:00
Eugene Rakhmatulin
9691eed1b0 Disabled Triton build for now 2026-01-31 00:10:52 -08:00
Eugene Rakhmatulin
7c61b4057c Added Triton compilation to custom build 2026-01-30 23:44:20 -08:00
Eugene Rakhmatulin
0482435848 Restore previous wheels build 2026-01-30 18:43:39 -08:00
Eugene Rakhmatulin
a6d6bafa69 Merge branch 'main' into pytorch-base 2026-01-30 17:06:29 -08:00
Eugene Rakhmatulin
4a4b4e7610 Fixed a bug when solo mode failed on a standalone Spark without configured RoCE. 2026-01-30 16:39:11 -08:00
Eugene Rakhmatulin
a4b524625a using "from scratch" build for wheels to reduce image size 2026-01-30 16:29:47 -08:00
Eugene Rakhmatulin
518dc0108b moved deps buster 2026-01-30 15:25:54 -08:00