Tutorials
Step-by-step tutorials to guide you through complete workflows, from data preparation to serving trained models in production.
Serve in vLLM
Deploy your trained speculator models in vLLM for production inference.
Time required: ~5 minutes
Train Eagle-3 Model Online
Learn how to train an Eagle-3 speculator using online training, where hidden states are generated on-demand during training.
Time required: ~30 mins
Train Eagle-3 Model Offline
Learn how to train an Eagle-3 speculator using offline training with pre-generated hidden states.
Time required: ~3 hours
Train DFlash Model Online
Learn how to train a DFlash speculator model with block-based token generation.
Time required: ~25 mins
Train P-eagle Model offline
Learn how to train a P-eagle speculator model with COD sampling.
Time required: ~50 mins
Train MTP Model Online
Learn how to finetune a model's native MTP head on domain-specific data using online training.
Time required: ~8 mins for Qwen3.5-9B on 2x H200 GPUs (varies by model size)
Response Regeneration
Regenerate dataset responses using your target model for improved drafter alignment.
Time required: ~10 minutes
Evaluating Model Performance
Benchmark and evaluate your trained speculator models.