Skip to main content
Ctrl+K
production-stack - Home production-stack - Home

Getting Started

  • Installation
  • Minimal Example
  • Troubleshooting

Deployment

  • Helm Charts
  • Cloud Environments
    • Amazon Web Services
    • Google Cloud Platform
    • Azure Kubernetes Service
  • Ray Deployment

User Manual

  • Router Configuration
    • CRD based configuration (recommended)
    • JSON based configuration
    • Command Line based configuration
  • LORA Configuration
    • CRD based configuration (recommended)
    • Manually Load LORA
  • KV Cache Offloading

Developer Guide

  • Peripheral
    • Observability Models
    • Interfaces
  • Developer API
    • Router Logic
    • Engine Stats
    • Request Stats
    • Service Discovery

Tutorials

  • How to Guides
    • Disaggregated Prefill
    • KV Cache Offloading
    • LORA Loading

Benchmarks

  • Multi-round QA
  • Repository
  • Suggest edit
  • .rst

Developer API

Developer API#

Developer Guide

  • Router Logic
  • Engine Stats
  • Request Stats
  • Service Discovery

previous

Interfaces

next

Router Logic

By vLLM Production Stack Team

© Copyright 2025, vLLM Production Stack Team.