Online Serving#
Online serving examples demonstrate how to use vLLM in an online setting, where the model is queried for predictions in real-time.
- API Client
- Helm Charts
- Disaggregated Prefill
- Gradio OpenAI Chatbot Webserver
- Gradio Webserver
- OpenAI Chat Completion Client
- OpenAI Chat Completion Client For Multimodal
- OpenAI Chat Completion Client With Tools
- OpenAI Chat Completion Structured Outputs
- OpenAI Chat Embedding Client For Multimodal
- OpenAI Completion Client
- OpenAI Cross Encoder Score
- OpenAI Embedding Client
- OpenAI Pooling Client
- Setup OpenTelemetry POC
- Prometheus and Grafana
- Run Cluster
- Sagemaker-Entrypoint