# Launch Scripts This directory contains bash scripts that can be executed in the container using the `--launch-script` option. Launch scripts are simple, executable bash files that run directly inside the container. ## Why Launch Scripts? - **Simple** - Just write a bash script that runs your command - **Flexible** - Use any bash features: environment variables, conditionals, loops - **Standalone** - Each script can be tested directly on a head node - **No magic** - What you see is what gets executed ## Usage ```bash # Use a launch script by name (looks in profiles/ directory) ./launch-cluster.sh --launch-script example-vllm-minimax # Use a launch script by filename ./launch-cluster.sh --launch-script example-vllm-minimax.sh # Use a launch script with absolute path ./launch-cluster.sh --launch-script /path/to/my-script.sh # Combine with mods if needed ./launch-cluster.sh --launch-script my-script.sh --apply-mod mods/my-patch # Combine with other options ./launch-cluster.sh -n 192.168.1.1,192.168.1.2 --launch-script my-model.sh -d ``` When using `--launch-script`, the `exec` action is automatically implied if no action is specified. ## Script Structure Launch scripts are simple bash scripts. The script is copied into the container at `/workspace/exec-script.sh` and executed. ```bash #!/bin/bash # PROFILE: Human-readable name # DESCRIPTION: What this script does # Optional: Set environment variables export MY_VAR="value" # Run your command vllm serve org/model-name \ --port 8000 \ --host 0.0.0.0 \ --gpu-memory-utilization 0.7 ``` ### Metadata Comments The `# PROFILE:` and `# DESCRIPTION:` comments are optional but recommended for documentation: ```bash #!/bin/bash # PROFILE: MiniMax-M2-AWQ Example # DESCRIPTION: vLLM serving MiniMax-M2-AWQ with Ray distributed backend ``` ## Examples ### Basic vLLM Serving ```bash #!/bin/bash # PROFILE: MiniMax-M2-AWQ # DESCRIPTION: vLLM serving MiniMax-M2-AWQ with Ray distributed backend vllm serve QuantTrio/MiniMax-M2-AWQ \ --port 8000 \ --host 0.0.0.0 \ --gpu-memory-utilization 0.7 \ -tp 2 \ --distributed-executor-backend ray \ --max-model-len 128000 \ --load-format fastsafetensors \ --enable-auto-tool-choice \ --tool-call-parser minimax_m2 ``` ### With Environment Variables ```bash #!/bin/bash # PROFILE: OpenAI GPT-OSS 120B # DESCRIPTION: vLLM serving openai/gpt-oss-120b with FlashInfer MOE optimization # Enable FlashInfer MOE with MXFP4/MXFP8 quantization export VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8=1 vllm serve openai/gpt-oss-120b \ --tool-call-parser openai \ --enable-auto-tool-choice \ --tensor-parallel-size 2 \ --distributed-executor-backend ray \ --host 0.0.0.0 \ --port 8000 ``` ### With Conditional Logic ```bash #!/bin/bash # PROFILE: Adaptive Model Server # DESCRIPTION: Adjusts settings based on available GPUs GPU_COUNT=$(nvidia-smi -L | wc -l) echo "Detected $GPU_COUNT GPUs" if [[ $GPU_COUNT -ge 4 ]]; then TP_SIZE=4 MEM_UTIL=0.9 else TP_SIZE=2 MEM_UTIL=0.7 fi vllm serve meta-llama/Llama-3.1-70B-Instruct \ --port 8000 \ --host 0.0.0.0 \ -tp $TP_SIZE \ --gpu-memory-utilization $MEM_UTIL \ --distributed-executor-backend ray ``` ### SGLang ```bash #!/bin/bash # PROFILE: SGLang Llama 3.1 # DESCRIPTION: SGLang runtime with Llama 3.1 sglang launch meta-llama/Llama-3.1-8B-Instruct \ --port 8000 \ --host 0.0.0.0 \ --tp 2 ``` ### With Model Requiring Patches If your model requires patches, use `--apply-mod` alongside `--launch-script`: ```bash # Script: vllm-glm-4.7-nvfp4.sh #!/bin/bash # PROFILE: Salyut1/GLM-4.7-NVFP4 # DESCRIPTION: vLLM serving GLM-4.7-NVFP4 # NOTE: Requires --apply-mod mods/fix-Salyut1-GLM-4.7-NVFP4 vllm serve Salyut1/GLM-4.7-NVFP4 \ --attention-config.backend flashinfer \ --tool-call-parser glm47 \ -tp 2 \ --host 0.0.0.0 \ --port 8000 ``` Usage: ```bash ./launch-cluster.sh --launch-script vllm-glm-4.7-nvfp4.sh --apply-mod mods/fix-Salyut1-GLM-4.7-NVFP4 exec ``` ## Creating a New Launch Script 1. Create a new `.sh` file in this directory 2. Add the shebang `#!/bin/bash` 3. Add `# PROFILE:` and `# DESCRIPTION:` comments 4. Write your command (e.g., `vllm serve ...`) 5. Run with `./launch-cluster.sh --launch-script my-script.sh exec` ## Testing Scripts Since launch scripts are standard bash files, you can test them directly: ```bash # Inside a running container or on a head node with the runtime installed cd profiles ./my-script.sh ``` This makes development and debugging much easier than complex configuration systems.