Arguments with special characters (e.g. JSON strings) were passed
unquoted, causing breakage for commands like:
--speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
Use printf %q in launch-cluster.sh and shlex.quote() in run-recipe.py
to properly escape arguments.
Implements Unix-style pass-through allowing any vLLM argument to be
passed after `--` separator. Arguments are appended verbatim to the
generated vLLM command.
Examples:
./run-recipe.py model --solo -- --load-format safetensors
./run-recipe.py model --solo -- --served-model-name my-api
./run-recipe.py model --solo -- -cc.cudagraph_mode=PIECEWISE
Features:
- Uses parse_known_args() to capture arguments after --
- Warns when extra args duplicate CLI overrides (--port, --tp, etc.)
- Works in both solo and cluster modes
Adds 10 integration tests covering:
- --load-format, --served-model-name, equals syntax
- Multiple arguments, empty --, cluster mode
- Duplicate detection warnings for port/tp/gpu-mem
Closes#30
Introduces a YAML recipe system for simplified model deployment:
- run-recipe.py: Main script handling build, download, and launch
- run-recipe.sh: Bash wrapper for dependency management
- recipes/: Pre-configured recipes for common models
- glm-4.7-flash-awq.yaml: GLM-4.7-Flash with AWQ quantization
- glm-4.7-nvfp4.yaml: GLM-4.7 with NVFP4 (cluster-only)
- minimax-m2-awq.yaml: MiniMax M2 with AWQ
- openai-gpt-oss-120b.yaml: OpenAI GPT-OSS 120B with MXFP4
Key features:
- Auto-discover cluster nodes with --discover, saves to .env
- Load nodes from .env automatically on subsequent runs
- cluster_only flag for models requiring multi-node setup
- build_args field for Dockerfile selection (--pre-tf, --exp-mxfp4)
- Solo mode auto-strips --distributed-executor-backend ray
- --setup flag for full build + download + run workflow
- --dry-run to preview execution without running
Usage:
./run-recipe.sh --discover # Find and save cluster nodes
./run-recipe.sh glm-4.7-flash-awq --solo --setup
./run-recipe.sh glm-4.7-nvfp4 --setup # Uses nodes from .env