Examples#
- API Client
- Aqlm Example
- Cpu Offload
- Florence2 Inference
- Gguf Inference
- Gradio OpenAI Chatbot Webserver
- Gradio Webserver
- LLM Engine Example
- Lora With Quantization Inference
- MultiLoRA Inference
- Offline Chat With Tools
- Offline Inference
- Offline Inference Arctic
- Offline Inference Audio Language
- Offline Inference Chat
- Offline Inference Cli
- Offline Inference Distributed
- Offline Inference Embedding
- Offline Inference Encoder Decoder
- Offline Inference Mlpspeculator
- Offline Inference Neuron
- Offline Inference Neuron Int8 Quantization
- Offline Inference Pixtral
- Offline Inference Structured Outputs
- Offline Inference Tpu
- Offline Inference Vision Language
- Offline Inference Vision Language Embedding
- Offline Inference Vision Language Multi Image
- Offline Inference With Prefix
- Offline Inference With Profiler
- Offline Profile
- OpenAI Chat Completion Client
- OpenAI Chat Completion Client For Multimodal
- OpenAI Chat Completion Client With Tools
- OpenAI Chat Completion Structured Outputs
- OpenAI Chat Embedding Client For Multimodal
- OpenAI Completion Client
- OpenAI Cross Encoder Score
- OpenAI Embedding Client
- Save Sharded State
- Tensorize vLLM Model