From cf8e411ad2535c1e06841c03463fd37595063461 Mon Sep 17 00:00:00 2001
From: eugr <eugr@spark2.home.eugr.net>
Date: Wed, 26 Nov 2025 14:01:04 -0800
Subject: [PATCH] Added benchmarking

---
 README.md | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/README.md b/README.md
index 29165eb..2d87700 100644
--- a/README.md
+++ b/README.md
@@ -198,6 +198,21 @@ docker exec -it vllm_node
 
 And execute vllm command inside.
 
+## 5\. Benchmarking
+
+Follow the guidance in [VLLM Benchmark Suites](https://docs.vllm.ai/en/latest/contributing/benchmarks/) to download benchmarking dataset, and then run a benchmark with a command like this (assuming you are running on head node, otherwise specify `--host` parameter):
+
+```bash 
+vllm bench serve \
+  --backend vllm \
+  --model RedHatAI/Qwen3-VL-235B-A22B-Instruct-NVFP4 \
+  --endpoint /v1/completions   --dataset-name sharegpt \
+  --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json \
+  --num-prompts 1 \
+  --port 8888 
+```
+
+Modify `--num-prompts` to benchmark concurrent requests - the command above will give you single request performance.
 
 ### Hardware Architecture