From 63a1a6a97c95f70b7fe7f9047ab735f5fa6f8aee Mon Sep 17 00:00:00 2001 From: Eugene Rakhmatulin Date: Sat, 20 Dec 2025 23:16:12 -0800 Subject: [PATCH] Update README to reflect reduced build time and container size for vLLM --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index fdbbc44..abf766e 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,7 @@ The Dockerfile builds from the main branch of VLLM, so depending on when you run - Added `--pre-flashinfer` flag to `build-and-copy.sh` to use pre-release versions of FlashInfer. - Added `--use-wheels [mode]` flag to `build-and-copy.sh`. - Allows building the container using pre-built vLLM wheels instead of compiling from source. - - The resulting Docker container size is reduced considerably (14GB vs 24GB) + - Reduced build time and container size. - `mode` is optional and defaults to `nightly`. - Supported modes: `nightly` (release wheels are broken with CUDA 13 currently). ### 2025-12-19