Updated README
This commit is contained in:
56
README.md
56
README.md
@@ -149,6 +149,54 @@ Don't do it every time you rebuild, because it will slow down compilation times.
|
|||||||
|
|
||||||
For periodic maintenance, I recommend using a filter: `docker builder prune --filter until=72h`
|
For periodic maintenance, I recommend using a filter: `docker builder prune --filter until=72h`
|
||||||
|
|
||||||
|
### 2026-03-11
|
||||||
|
|
||||||
|
#### Qwen3-Coder-Next INT4-AutoRound Recipe
|
||||||
|
|
||||||
|
Added a new recipe `qwen3-coder-next-int4-autoround` for running Intel/Qwen3-Coder-Next-int4-AutoRound. Supports single Spark only (use with `--solo` switch), since split weights are too small for Marlin kernel.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./run-recipe.sh qwen3-coder-next-int4-autoround --solo
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2026-03-06
|
||||||
|
|
||||||
|
#### `-e/--env` Passthrough in `run-recipe.py`
|
||||||
|
|
||||||
|
`run-recipe.sh` now accepts one or more `-e VAR=VALUE` flags to pass environment variables directly to the container, mirroring the existing behaviour of `launch-cluster.sh`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./run-recipe.sh qwen3.5-122b-int4-autoround --solo -e HF_TOKEN=$HF_TOKEN
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Unsloth Chat Template for Qwen3.5
|
||||||
|
|
||||||
|
Added a new mod `mods/fix-qwen3.5-chat-template` that applies the Unsloth chat template to Qwen3.5 models for better compatibility with modern clients. The template is now included in the `qwen3.5-122b-fp8`, `qwen3.5-122b-int4-autoround`, and `qwen3.5-35b-a3b-fp8` recipes.
|
||||||
|
|
||||||
|
#### Fix Shell Quoting for Exec Command Arguments
|
||||||
|
|
||||||
|
Fixed shell quoting for exec command arguments in `launch-cluster.sh` and `run-recipe.py` to correctly handle arguments containing spaces or special characters.
|
||||||
|
|
||||||
|
### 2026-03-05
|
||||||
|
|
||||||
|
#### Qwen3.5-35B-A3B-FP8 Recipe
|
||||||
|
|
||||||
|
Added a new recipe `qwen3.5-35b-a3b-fp8` for running Qwen3.5-35B-A3B in FP8 format.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./run-recipe.sh qwen3.5-35b-a3b-fp8
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 4× Spark Cluster Recipes
|
||||||
|
|
||||||
|
Added a `recipes/4x-spark-cluster/` subdirectory with recipes optimised for a 4-node Spark cluster:
|
||||||
|
- `minimax-m2.5` — MiniMax M2.5 on 4× Spark
|
||||||
|
- `qwen3.5-397b-a17B-fp8` — Qwen3.5-397B-A17B in FP8 on 4× Spark
|
||||||
|
|
||||||
|
#### More Robust Wheels Check Before Download
|
||||||
|
|
||||||
|
Improved the wheels availability check in `build-and-copy.sh` to be more reliable when deciding whether to download remote wheels.
|
||||||
|
|
||||||
### 2026-03-04
|
### 2026-03-04
|
||||||
|
|
||||||
#### Prebuilt vLLM Wheels via GitHub Releases
|
#### Prebuilt vLLM Wheels via GitHub Releases
|
||||||
@@ -164,6 +212,14 @@ No new flags are required — the download happens transparently.
|
|||||||
|
|
||||||
All prebuilt wheels are now tested with multiple models in both solo and cluster configuration as a part of automated deployment pipeline which will now run nightly. The wheels are released only if they pass all the tests and no significant performance regressions are detected.
|
All prebuilt wheels are now tested with multiple models in both solo and cluster configuration as a part of automated deployment pipeline which will now run nightly. The wheels are released only if they pass all the tests and no significant performance regressions are detected.
|
||||||
|
|
||||||
|
#### Qwen3.5-122B-FP8 Recipe
|
||||||
|
|
||||||
|
Added a new recipe `qwen3.5-122b-fp8` for running Qwen3.5-122B in FP8 format.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./run-recipe.sh qwen3.5-122b-fp8
|
||||||
|
```
|
||||||
|
|
||||||
### 2026-03-02
|
### 2026-03-02
|
||||||
|
|
||||||
#### Qwen3.5-122B-INT4-Autoround Support
|
#### Qwen3.5-122B-INT4-Autoround Support
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ solo_only: true
|
|||||||
# Container image to use
|
# Container image to use
|
||||||
container: vllm-node
|
container: vllm-node
|
||||||
|
|
||||||
# Mod required to fix slowness and crash in the cluster (tracking https://github.com/vllm-project/vllm/issues/33857)
|
# Mod required to fix autoround weight loading issues
|
||||||
mods:
|
mods:
|
||||||
- mods/fix-qwen3-next-autoround
|
- mods/fix-qwen3-next-autoround
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user