Skip to main content
Ctrl+K
production-stack - Home production-stack - Home

Getting Started

  • Installation
  • Minimal Example
  • Troubleshooting

Tutorials

  • Disaggregated Prefill
  • KV Cache Offloading
  • LORA Loading
  • KV Cache Aware Routing
  • Prefix Aware Routing

Deployment

  • Helm Charts
  • Cloud Environments
    • Amazon Web Services
    • Google Cloud Platform
    • Azure Kubernetes Service
  • Ray Deployment

User Manual

  • Router Configuration
    • CRD based configuration (recommended)
    • JSON based configuration
    • Command Line based configuration
  • LORA Configuration
    • CRD based configuration (recommended)
    • Manually Load LORA
  • KV Cache Offloading

Developer Guide

  • Peripheral
    • Observability Models
    • Interfaces
  • Developer API
    • Router Logic
    • Engine Stats
    • Request Stats
    • Service Discovery

Benchmarks

  • Multi-round QA
  • Repository
  • Suggest edit
  • .rst

Router Configuration

Router Configuration#

Test text

User Manual

  • CRD based configuration (recommended)
  • JSON based configuration
  • Command Line based configuration

previous

Ray Deployment

next

CRD based configuration (recommended)

By vLLM Production Stack Team

© Copyright 2025, vLLM Production Stack Team.