Move recipe to 4x-spark-cluster/ and add UMA memory optimizations

- Move qwen3.5-397b-int4-autoround.yaml to recipes/4x-spark-cluster/ per maintainer request (multi-node recipes in separate directory) - Add PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to recipe env - Optimize Ray for GB10 UMA (128GB shared CPU/GPU memory): - Disable Ray dashboard (saves ~1.2 GiB per node) - Limit Ray object store to 1 GiB (default 30% of RAM = 33 GiB) - Disable pre-started idle workers (saves ~8 GiB on head node) - Set --num-cpus 2 and --disable-usage-stats on all nodes - Net effect: ~40+ GiB freed across 4-node cluster for model/KV cache
2026-03-11 07:29:45 +00:00
parent 006734910c
commit 3baca14eb1
2 changed files with 10 additions and 5 deletions
--- a/recipes/4x-spark-cluster/qwen3.5-397b-int4-autoround.yaml
+++ b/recipes/4x-spark-cluster/qwen3.5-397b-int4-autoround.yaml
@@ -24,6 +24,7 @@ mods:
 # Environment variables
 env:
  VLLM_MARLIN_USE_ATOMIC_ADD: 1
+  PYTORCH_CUDA_ALLOC_CONF: expandable_segments:True

 # Default settings (can be overridden via CLI, e.g. --tensor_parallel 2)
 defaults: