Examples#
- API Client
- Aqlm Example
- Cpu Offload
- Gguf Inference
- Gradio OpenAI Chatbot Webserver
- Gradio Webserver
- LLM Engine Example
- Lora With Quantization Inference
- MultiLoRA Inference
- Offline Chat With Tools
- Offline Inference
- Offline Inference Arctic
- Offline Inference Audio Language
- Offline Inference Chat
- Offline Inference Distributed
- Offline Inference Embedding
- Offline Inference Encoder Decoder
- Offline Inference Mlpspeculator
- Offline Inference Neuron
- Offline Inference Neuron Int8 Quantization
- Offline Inference Pixtral
- Offline Inference Tpu
- Offline Inference Vision Language
- Offline Inference Vision Language Multi Image
- Offline Inference With Prefix
- Offline Inference With Profiler
- OpenAI Audio API Client
- OpenAI Chat Completion Client
- OpenAI Chat Completion Client With Tools
- OpenAI Completion Client
- OpenAI Embedding Client
- OpenAI Vision API Client
- Save Sharded State
- Tensorize vLLM Model