Move recipe to 4x-spark-cluster/ and add UMA memory optimizations
- Move qwen3.5-397b-int4-autoround.yaml to recipes/4x-spark-cluster/ per maintainer request (multi-node recipes in separate directory) - Add PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to recipe env - Optimize Ray for GB10 UMA (128GB shared CPU/GPU memory): - Disable Ray dashboard (saves ~1.2 GiB per node) - Limit Ray object store to 1 GiB (default 30% of RAM = 33 GiB) - Disable pre-started idle workers (saves ~8 GiB on head node) - Set --num-cpus 2 and --disable-usage-stats on all nodes - Net effect: ~40+ GiB freed across 4-node cluster for model/KV cache
This commit is contained in:
@@ -24,6 +24,7 @@ mods:
|
||||
# Environment variables
|
||||
env:
|
||||
VLLM_MARLIN_USE_ATOMIC_ADD: 1
|
||||
PYTORCH_CUDA_ALLOC_CONF: expandable_segments:True
|
||||
|
||||
# Default settings (can be overridden via CLI, e.g. --tensor_parallel 2)
|
||||
defaults:
|
||||
Reference in New Issue
Block a user