Skip to main content
Ctrl+K
production-stack - Home production-stack - Home

Getting Started

  • Installation
  • Minimal Example
  • Troubleshooting

Deployment

  • Helm Charts
  • Cloud Environments
    • Amazon Web Services
    • Google Cloud Platform
    • Azure Kubernetes Service
  • Ray Deployment

User Manual

  • Router Configuration
    • CRD based configuration (recommended)
    • JSON based configuration
    • Command Line based configuration
  • LORA Configuration
    • CRD based configuration (recommended)
    • Manually Load LORA
  • KV Cache Offloading

Developer Guide

  • Peripheral
    • Observability Models
    • Interfaces
  • Developer API
    • Router Logic
    • Engine Stats
    • Request Stats
    • Service Discovery

Tutorials

  • How to Guides
    • Disaggregated Prefill
    • KV Cache Offloading
    • LORA Loading

Benchmarks

  • Multi-round QA
  • Repository
  • Suggest edit
  • .rst

Peripheral

Peripheral#

Developer Guide

  • Observability Models
  • Interfaces

previous

KV Cache Offloading

next

Observability Models

By vLLM Production Stack Team

© Copyright 2025, vLLM Production Stack Team.