Files

Example Launch Scripts

This directory contains example bash scripts that demonstrate how to use the --launch-script option directly with launch-cluster.sh.

Note: For most use cases, the recipe system (./run-recipe.sh) is the recommended approach. These examples are provided for reference and for advanced users who need direct control over the launch process.

Why Launch Scripts?

  • Simple - Just write a bash script that runs your command
  • Flexible - Use any bash features: environment variables, conditionals, loops
  • Standalone - Each script can be tested directly on a head node
  • No magic - What you see is what gets executed

Usage

# Use a launch script by name (looks in examples/ directory)
./launch-cluster.sh --launch-script example-vllm-minimax

# Use a launch script by filename
./launch-cluster.sh --launch-script example-vllm-minimax.sh

# Use a launch script with absolute path
./launch-cluster.sh --launch-script /path/to/my-script.sh

# Combine with mods if needed
./launch-cluster.sh --launch-script my-script.sh --apply-mod mods/my-patch

# Combine with other options
./launch-cluster.sh -n 192.168.1.1,192.168.1.2 --launch-script my-model.sh -d

When using --launch-script, the exec action is automatically implied if no action is specified.

Script Structure

Launch scripts are simple bash scripts. The script is copied into the container at /workspace/exec-script.sh and executed.

#!/bin/bash
# PROFILE: Human-readable name
# DESCRIPTION: What this script does

# Optional: Set environment variables
export MY_VAR="value"

# Run your command
vllm serve org/model-name \
    --port 8000 \
    --host 0.0.0.0 \
    --gpu-memory-utilization 0.7

Metadata Comments

The # PROFILE: and # DESCRIPTION: comments are optional but recommended for documentation:

#!/bin/bash
# PROFILE: MiniMax-M2-AWQ Example
# DESCRIPTION: vLLM serving MiniMax-M2-AWQ with Ray distributed backend

Examples

Basic vLLM Serving

#!/bin/bash
# PROFILE: MiniMax-M2-AWQ
# DESCRIPTION: vLLM serving MiniMax-M2-AWQ with Ray distributed backend

vllm serve QuantTrio/MiniMax-M2-AWQ \
    --port 8000 \
    --host 0.0.0.0 \
    --gpu-memory-utilization 0.7 \
    -tp 2 \
    --distributed-executor-backend ray \
    --max-model-len 128000 \
    --load-format fastsafetensors \
    --enable-auto-tool-choice \
    --tool-call-parser minimax_m2

With Environment Variables

#!/bin/bash
# PROFILE: OpenAI GPT-OSS 120B
# DESCRIPTION: vLLM serving openai/gpt-oss-120b with FlashInfer MOE optimization

# Enable FlashInfer MOE with MXFP4/MXFP8 quantization
export VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8=1

vllm serve openai/gpt-oss-120b \
    --tool-call-parser openai \
    --enable-auto-tool-choice \
    --tensor-parallel-size 2 \
    --distributed-executor-backend ray \
    --host 0.0.0.0 \
    --port 8000

With Conditional Logic

#!/bin/bash
# PROFILE: Adaptive Model Server
# DESCRIPTION: Adjusts settings based on available GPUs

GPU_COUNT=$(nvidia-smi -L | wc -l)
echo "Detected $GPU_COUNT GPUs"

if [[ $GPU_COUNT -ge 4 ]]; then
    TP_SIZE=4
    MEM_UTIL=0.9
else
    TP_SIZE=2
    MEM_UTIL=0.7
fi

vllm serve meta-llama/Llama-3.1-70B-Instruct \
    --port 8000 \
    --host 0.0.0.0 \
    -tp $TP_SIZE \
    --gpu-memory-utilization $MEM_UTIL \
    --distributed-executor-backend ray

SGLang

#!/bin/bash
# PROFILE: SGLang Llama 3.1
# DESCRIPTION: SGLang runtime with Llama 3.1

sglang launch meta-llama/Llama-3.1-8B-Instruct \
    --port 8000 \
    --host 0.0.0.0 \
    --tp 2

With Model Requiring Patches

If your model requires patches, use --apply-mod alongside --launch-script:

# Script: vllm-glm-4.7-nvfp4.sh
#!/bin/bash
# PROFILE: Salyut1/GLM-4.7-NVFP4
# DESCRIPTION: vLLM serving GLM-4.7-NVFP4
# NOTE: Requires --apply-mod mods/fix-Salyut1-GLM-4.7-NVFP4

vllm serve Salyut1/GLM-4.7-NVFP4 \
    --attention-config.backend flashinfer \
    --tool-call-parser glm47 \
    -tp 2 \
    --host 0.0.0.0 \
    --port 8000

Usage:

./launch-cluster.sh --launch-script vllm-glm-4.7-nvfp4.sh --apply-mod mods/fix-Salyut1-GLM-4.7-NVFP4 exec

Creating a New Launch Script

  1. Create a new .sh file in this directory
  2. Add the shebang #!/bin/bash
  3. Add # PROFILE: and # DESCRIPTION: comments
  4. Write your command (e.g., vllm serve ...)
  5. Run with ./launch-cluster.sh --launch-script my-script.sh exec

Testing Scripts

Since launch scripts are standard bash files, you can test them directly:

# Inside a running container or on a head node with the runtime installed
cd profiles
./my-script.sh

This makes development and debugging much easier than complex configuration systems.