From 45066e2b1674fa2522a95590db474fd28e7e8e62 Mon Sep 17 00:00:00 2001 From: Eugene Rakhmatulin Date: Wed, 11 Mar 2026 09:57:34 -0700 Subject: [PATCH] Updated README --- README.md | 58 +++++++++++++++++++- recipes/qwen3-coder-next-int4-autoround.yaml | 2 +- 2 files changed, 58 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 8e5c9c3..c3f1e52 100644 --- a/README.md +++ b/README.md @@ -149,6 +149,54 @@ Don't do it every time you rebuild, because it will slow down compilation times. For periodic maintenance, I recommend using a filter: `docker builder prune --filter until=72h` +### 2026-03-11 + +#### Qwen3-Coder-Next INT4-AutoRound Recipe + +Added a new recipe `qwen3-coder-next-int4-autoround` for running Intel/Qwen3-Coder-Next-int4-AutoRound. Supports single Spark only (use with `--solo` switch), since split weights are too small for Marlin kernel. + +```bash +./run-recipe.sh qwen3-coder-next-int4-autoround --solo +``` + +### 2026-03-06 + +#### `-e/--env` Passthrough in `run-recipe.py` + +`run-recipe.sh` now accepts one or more `-e VAR=VALUE` flags to pass environment variables directly to the container, mirroring the existing behaviour of `launch-cluster.sh`. + +```bash +./run-recipe.sh qwen3.5-122b-int4-autoround --solo -e HF_TOKEN=$HF_TOKEN +``` + +#### Unsloth Chat Template for Qwen3.5 + +Added a new mod `mods/fix-qwen3.5-chat-template` that applies the Unsloth chat template to Qwen3.5 models for better compatibility with modern clients. The template is now included in the `qwen3.5-122b-fp8`, `qwen3.5-122b-int4-autoround`, and `qwen3.5-35b-a3b-fp8` recipes. + +#### Fix Shell Quoting for Exec Command Arguments + +Fixed shell quoting for exec command arguments in `launch-cluster.sh` and `run-recipe.py` to correctly handle arguments containing spaces or special characters. + +### 2026-03-05 + +#### Qwen3.5-35B-A3B-FP8 Recipe + +Added a new recipe `qwen3.5-35b-a3b-fp8` for running Qwen3.5-35B-A3B in FP8 format. + +```bash +./run-recipe.sh qwen3.5-35b-a3b-fp8 +``` + +#### 4× Spark Cluster Recipes + +Added a `recipes/4x-spark-cluster/` subdirectory with recipes optimised for a 4-node Spark cluster: +- `minimax-m2.5` — MiniMax M2.5 on 4× Spark +- `qwen3.5-397b-a17B-fp8` — Qwen3.5-397B-A17B in FP8 on 4× Spark + +#### More Robust Wheels Check Before Download + +Improved the wheels availability check in `build-and-copy.sh` to be more reliable when deciding whether to download remote wheels. + ### 2026-03-04 #### Prebuilt vLLM Wheels via GitHub Releases @@ -162,7 +210,15 @@ The download logic mirrors the FlashInfer behaviour: No new flags are required — the download happens transparently. -All prebuilt wheels are now tested with multiple models in both solo and cluster configuration as a part of automated deployment pipeline which will now run nightly. The wheels are released only if they pass all the tests and no significant performance regressions are detected. +All prebuilt wheels are now tested with multiple models in both solo and cluster configuration as a part of automated deployment pipeline which will now run nightly. The wheels are released only if they pass all the tests and no significant performance regressions are detected. + +#### Qwen3.5-122B-FP8 Recipe + +Added a new recipe `qwen3.5-122b-fp8` for running Qwen3.5-122B in FP8 format. + +```bash +./run-recipe.sh qwen3.5-122b-fp8 +``` ### 2026-03-02 diff --git a/recipes/qwen3-coder-next-int4-autoround.yaml b/recipes/qwen3-coder-next-int4-autoround.yaml index a438786..b308a20 100644 --- a/recipes/qwen3-coder-next-int4-autoround.yaml +++ b/recipes/qwen3-coder-next-int4-autoround.yaml @@ -14,7 +14,7 @@ solo_only: true # Container image to use container: vllm-node -# Mod required to fix slowness and crash in the cluster (tracking https://github.com/vllm-project/vllm/issues/33857) +# Mod required to fix autoround weight loading issues mods: - mods/fix-qwen3-next-autoround