Update README to reflect reduced build time and container size for vLLM

This commit is contained in:
Eugene Rakhmatulin
2025-12-20 23:16:12 -08:00
parent dfe426e912
commit 63a1a6a97c

View File

@@ -30,7 +30,7 @@ The Dockerfile builds from the main branch of VLLM, so depending on when you run
- Added `--pre-flashinfer` flag to `build-and-copy.sh` to use pre-release versions of FlashInfer.
- Added `--use-wheels [mode]` flag to `build-and-copy.sh`.
- Allows building the container using pre-built vLLM wheels instead of compiling from source.
- The resulting Docker container size is reduced considerably (14GB vs 24GB)
- Reduced build time and container size.
- `mode` is optional and defaults to `nightly`.
- Supported modes: `nightly` (release wheels are broken with CUDA 13 currently).
### 2025-12-19