Updated README, added NVFP4 fix

This commit is contained in:
Eugene Rakhmatulin
2026-03-30 11:45:40 -07:00
parent a3201f8873
commit 45494688d1
2 changed files with 16 additions and 21 deletions

View File

@@ -69,7 +69,7 @@ An initial build speed depends on your Internet connection speed and whether the
**On a single node**:
**NEW** - `launch-cluster.sh` now supports solo mode, which is now a recommended way to run the container on a single Spark:
`launch-cluster.sh` supports solo mode, which is now a recommended way to run the container on a single Spark:
```bash
./launch-cluster.sh --solo exec \
@@ -80,23 +80,6 @@ An initial build speed depends on your Internet connection speed and whether the
--load-format fastsafetensors
```
**To launch using regular `docker run`**
```bash
docker run \
--privileged \
--gpus all \
-it --rm \
--network host --ipc=host \
-v ~/.cache/huggingface:/root/.cache/huggingface \
vllm-node \
bash -c -i "vllm serve \
QuantTrio/Qwen3-VL-30B-A3B-Instruct-AWQ \
--port 8000 --host 0.0.0.0 \
--gpu-memory-utilization 0.7 \
--load-format fastsafetensors"
```
**On a cluster**
It's recommended to download the model on one node and distribute across the cluster using ConnectX interconnect prior to launching. This is to avoid re-downloading the model from the Internet on every node in the cluster.
@@ -151,7 +134,7 @@ For periodic maintenance, I recommend using a filter: `docker builder prune --fi
## CHANGELOG
### 2026-03-29
### 2026-03-30
#### Flags to specify Flashinfer ref and apply PRs
@@ -162,8 +145,6 @@ For periodic maintenance, I recommend using a filter: `docker builder prune --fi
Both flags are incompatible with `--exp-mxfp4`.
### 2026-03-27
#### Default image tag in `build-and-copy.sh`
`build-and-copy.sh` now automatically sets a sensible default image tag when `-t` is not specified: