From cf8e411ad2535c1e06841c03463fd37595063461 Mon Sep 17 00:00:00 2001 From: eugr Date: Wed, 26 Nov 2025 14:01:04 -0800 Subject: [PATCH] Added benchmarking --- README.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/README.md b/README.md index 29165eb..2d87700 100644 --- a/README.md +++ b/README.md @@ -198,6 +198,21 @@ docker exec -it vllm_node And execute vllm command inside. +## 5\. Benchmarking + +Follow the guidance in [VLLM Benchmark Suites](https://docs.vllm.ai/en/latest/contributing/benchmarks/) to download benchmarking dataset, and then run a benchmark with a command like this (assuming you are running on head node, otherwise specify `--host` parameter): + +```bash +vllm bench serve \ + --backend vllm \ + --model RedHatAI/Qwen3-VL-235B-A22B-Instruct-NVFP4 \ + --endpoint /v1/completions --dataset-name sharegpt \ + --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \ + --num-prompts 1 \ + --port 8888 +``` + +Modify `--num-prompts` to benchmark concurrent requests - the command above will give you single request performance. ### Hardware Architecture