Recipes and Launch Script support

2026-02-04 12:01:53 -08:00
parent ef6a5eca29
commit ec987259a0
3 changed files with 57 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -159,6 +159,58 @@ Don't do it every time you rebuild, because it will slow down compilation times.
 For periodic maintenance, I recommend using a filter: `docker builder prune --filter until=72h`
 ### 2026-02-04
 #### Recipes support
 A major contribution from @raphaelamorim - model recipes. 
 Recipes allow to launch models with preconfigured settings with one command.
 Example:
 ```bash
 # List available recipes
 ./run-recipe.sh --list
 # Run a recipe in solo mode (single node)
 ./run-recipe.sh glm-4.7-flash-awq --solo
 # Full setup: build container + download model + run
 ./run-recipe.sh glm-4.7-flash-awq --solo --setup
 # Run with overrides
 ./run-recipe.sh glm-4.7-flash-awq --solo --port 9000 --gpu-mem 0.8
 # Cluster deployment
 ./run-recipe.sh glm-4.7-nvfp4 --setup
 ```
 Please refer to the [documentation](recipes/README.md) for the details.
 #### Launch script option
 You can now specify a launch script to execute on head node instead of specifying a command directly via `exec` action. 
 Example: 
 ```bash
 ./launch-cluster.sh --launch-script examples/vllm-openai-gpt-oss-120b.sh
 ```
 Thanks @raphaelamorim for the contribution!
 #### Ability to apply vLLM PRs during build
 `./build-and-copy.sh` now supports ability to apply vLLM PRs to builds. PR is applied to the most recent vLLM commit (or specific vllm-ref if set). This does NOT apply to wheels build and MXFP4 special build!
 To use, just specify `--apply-vllm-pr <pr_num>` in the arguments. Please note that it may fail depending on whether the PR needs a rebase for the specified vLLM reference/main branch. Use with caution!
 Example:
 ```bash
 ./build-and-copy.sh -t vllm-node-20260204-pr31740 --apply-vllm-pr 31740 -c
 ```
 ### 2026-02-02
 #### Nemotron Nano mod
@@ -671,6 +723,7 @@ You can override the auto-detected values if needed:
 | `--nccl-debug` | NCCL debug level (e.g., INFO, WARN). Defaults to INFO if flag is present but value is omitted. |
 | `--check-config` | Check configuration and auto-detection without launching. |
 | `--solo` | Solo mode: skip autodetection, launch only on current node, do not launch Ray cluster |
 | `--launch-script` | Path to bash script to execute in the container (from examples/ directory or absolute path). If launch script is specified, action should be omitted. |
 | `-d` | Run in daemon mode (detached). |
 ## 3\. Running the Container (Manual)
@@ -887,13 +940,13 @@ vllm serve openai/gpt-oss-120b \
 ### Available Launch Scripts
-The `profiles/` directory contains ready-to-use launch scripts:
+The `examples/` directory contains ready-to-use launch scripts:
 - **example-vllm-minimax.sh** - MiniMax-M2-AWQ with Ray distributed backend
 - **vllm-openai-gpt-oss-120b.sh** - OpenAI GPT-OSS 120B with FlashInfer MOE
 - **vllm-glm-4.7-nvfp4.sh** - GLM-4.7-NVFP4 (requires the glm4_moe patch mod)
-See [profiles/README.md](profiles/README.md) for detailed documentation and more examples.
+See [examples/README.md](examples/README.md) for detailed documentation and more examples.
 ## 8\. Using cluster mode for inference
--- a/recipes/glm-4.7-flash-awq.yaml
+++ b/recipes/glm-4.7-flash-awq.yaml
@@ -33,7 +33,7 @@ mods:
 # Default settings (can be overridden via CLI)
 defaults:
-  port: 8888
+  port: 8000
  host: 0.0.0.0
  tensor_parallel: 1
  gpu_memory_utilization: 0.7
--- a/recipes/openai-gpt-oss-120b.yaml
+++ b/recipes/openai-gpt-oss-120b.yaml
@@ -20,7 +20,7 @@ mods: []
 # Default settings (can be overridden via CLI)
 defaults:
-  port: 8888
+  port: 8000
  host: 0.0.0.0
  tensor_parallel: 2
  gpu_memory_utilization: 0.70