Examples#
A collection of examples demonstrating usage of vLLM. All documented examples are autogenerated using docs/source/generate_examples.py from examples found in examples.
Examples
- Offline Inference
- Audio Language
- Basic
- Batch LLM Inference
- Chat With Tools
- Data Parallel
- Disaggregated Prefill
- Eagle
- Embed Jina Embeddings V3
- Embed Matryoshka Fy
- Encoder Decoder
- Encoder Decoder Multimodal
- LLM Engine Example
- Load Sharded State
- LoRA With Quantization Inference
- Mistral-Small
- MLPSpeculator
- MultiLoRA Inference
- Neuron
- Neuron Eagle
- Neuron INT8 Quantization
- Neuron Speculation
- Offline Inference with the OpenAI Batch file format
- Prefix Caching
- Prithvi Geospatial MAE
- Profiling
- vLLM TPU Profiling
- Qwen2.5-Omni Offline Inference Examples
- Qwen 1M
- Reproducibility
- RLHF
- RLHF Colocate
- RLHF Utils
- Save Sharded State
- Simple Profiling
- Structured Outputs
- Torchrun Example
- TPU
- Vision Language
- Vision Language Embedding
- Vision Language Multi Image
- Online Serving
- API Client
- Helm Charts
- Cohere Rerank Client
- Disaggregated Prefill
- Gradio OpenAI Chatbot Webserver
- Gradio Webserver
- Jinaai Rerank Client
- Kv Events
- Kv Events Subscriber
- Multi-Node-Serving
- OpenAI Chat Completion Client
- OpenAI Chat Completion Client For Multimodal
- OpenAI Chat Completion Client With Tools
- OpenAI Chat Completion Client With Tools Required
- OpenAI Chat Completion Structured Outputs
- OpenAI Chat Completion Structured Outputs Structural Tag
- OpenAI Chat Completion Structured Outputs With Reasoning
- OpenAI Chat Completion Tool Calls With Reasoning
- OpenAI Chat Completion With Reasoning
- OpenAI Chat Completion With Reasoning Streaming
- OpenAI Chat Embedding Client For Multimodal
- OpenAI Classification Client
- OpenAI Completion Client
- OpenAI Cross Encoder Score
- OpenAI Embedding Client
- OpenAI Embedding Matryoshka Fy
- OpenAI Pooling Client
- OpenAI Transcription Client
- Setup OpenTelemetry POC
- Prometheus and Grafana
- Ray Serve Deepseek
- Retrieval Augmented Generation With Langchain
- Retrieval Augmented Generation With Llamaindex
- Run Cluster
- Sagemaker-Entrypoint
- Streamlit OpenAI Chatbot Webserver
- Other
- LMCache Examples