spark-vllm-docker

Author	SHA1	Message	Date
Eugene Rakhmatulin	9dc09bd04b	Renamed recipe for qwen3.5-35b-a3b-fp8 to match others	2026-03-06 13:56:06 -08:00
eugr	d148d95a19	Merge pull request #80 from oliverjohnwilson/recipe-add_minimax-m2.5_qwen3.5-397b-a17B-fp8 added minimax-m2.5 and qwen3.5-397b-a17B-fp8 recipes to a recipes/4x-spark-cluster/ subdirectory	2026-03-06 11:46:37 -08:00
eugr	3fabd3fb1c	Merge pull request #72 from erikvullings/main Add Qwen35-35B-A3B recipe in FP8 format	2026-03-05 16:27:50 -08:00
Eugene Rakhmatulin	a749fcce87	Added a recipe for qwen3.5-122B-FP8	2026-03-04 16:49:39 -08:00
oliverjohnwilson	4303f8b6d0	added minimax-m2.5 and qwen3.5-397b-a17B-fp8 recipes to a recipes/4x-spark-cluster/ subdirectory	2026-03-04 16:01:37 -06:00
Erik Vullings	163f23d85b	Update qwen35-35b-a3b-fp8.yaml --max_num_batched_tokens is a default variable now, which can be overriden via the CLI	2026-03-03 12:46:12 +01:00
Eugene Rakhmatulin	7d8465fd9c	Added recipe for qwen3.5-122b-int4-autoround, updated README	2026-03-02 12:18:16 -08:00
Erik Vullings	e8f94d6b8b	Add Qwen35-35B-A3B recipe in FP8 format	2026-02-27 17:46:06 +01:00
Eugene Rakhmatulin	4c8f90395b	Changed reasoning parser in MInimax for better compatibility with modern clients (like coding tools).	2026-02-21 11:53:13 -08:00
Eugene Rakhmatulin	5b2313dddb	Changed KV type to fp8 in qwen3-coder-next recipe and reduced default context size to 131072 to ensure it all fits in a single Spark.	2026-02-17 13:07:54 -08:00
Eugene Rakhmatulin	1e7f2d5640	Small fix for M2.5 recipe	2026-02-16 11:38:34 -08:00
Eugene Rakhmatulin	24f42be5cc	Added a recipe for MiniMax M2.5 AWQ	2026-02-16 11:35:53 -08:00
Eugene Rakhmatulin	701147b1eb	Qwen3-Coder-Next fixes and updated recipe	2026-02-12 15:56:32 -08:00
Eugene Rakhmatulin	c6b245cfe8	Added prefix caching to nemotron recipe	2026-02-10 18:25:01 -08:00
Eugene Rakhmatulin	74876dd442	Added recipes for nemotron-nano-3 and qwen3-coder-next	2026-02-09 14:33:35 -08:00
Raphael Amorim	6943a51ced	Adding tests and refactoring repeated methods	2026-02-09 17:21:32 -05:00
Raphael Amorim	b7c3cdcfcb	Enhancement: add -- pass-through for arbitrary vLLM arguments Implements Unix-style pass-through allowing any vLLM argument to be passed after `--` separator. Arguments are appended verbatim to the generated vLLM command. Examples: ./run-recipe.py model --solo -- --load-format safetensors ./run-recipe.py model --solo -- --served-model-name my-api ./run-recipe.py model --solo -- -cc.cudagraph_mode=PIECEWISE Features: - Uses parse_known_args() to capture arguments after -- - Warns when extra args duplicate CLI overrides (--port, --tp, etc.) - Works in both solo and cluster modes Adds 10 integration tests covering: - --load-format, --served-model-name, equals syntax - Multiple arguments, empty --, cluster mode - Duplicate detection warnings for port/tp/gpu-mem Closes #30	2026-02-08 02:36:49 -05:00
Eugene Rakhmatulin	ec987259a0	Recipes and Launch Script support	2026-02-04 12:01:53 -08:00
Raphael Amorim	30f16f1d4e	feat: Add recipe-based one-click model deployment system Introduces a YAML recipe system for simplified model deployment: - run-recipe.py: Main script handling build, download, and launch - run-recipe.sh: Bash wrapper for dependency management - recipes/: Pre-configured recipes for common models - glm-4.7-flash-awq.yaml: GLM-4.7-Flash with AWQ quantization - glm-4.7-nvfp4.yaml: GLM-4.7 with NVFP4 (cluster-only) - minimax-m2-awq.yaml: MiniMax M2 with AWQ - openai-gpt-oss-120b.yaml: OpenAI GPT-OSS 120B with MXFP4 Key features: - Auto-discover cluster nodes with --discover, saves to .env - Load nodes from .env automatically on subsequent runs - cluster_only flag for models requiring multi-node setup - build_args field for Dockerfile selection (--pre-tf, --exp-mxfp4) - Solo mode auto-strips --distributed-executor-backend ray - --setup flag for full build + download + run workflow - --dry-run to preview execution without running Usage: ./run-recipe.sh --discover # Find and save cluster nodes ./run-recipe.sh glm-4.7-flash-awq --solo --setup ./run-recipe.sh glm-4.7-nvfp4 --setup # Uses nodes from .env	2026-02-03 16:09:12 -05:00

19 Commits