Skip to content

Tutorials

Step-by-step tutorials to guide you through complete workflows, from data preparation to serving trained models in production.

Serve in vLLM

Deploy your trained speculator models in vLLM for production inference.

Time required: ~5 minutes

Train Eagle-3 Model Online

Learn how to train an Eagle-3 speculator using online training, where hidden states are generated on-demand during training.

Time required: ~30 mins

Train Eagle-3 Model Offline

Learn how to train an Eagle-3 speculator using offline training with pre-generated hidden states.

Time required: ~3 hours

Train DFlash Model Online

COMING SOON

Learn how to train a DFlash speculator model with block-based token generation.

Response Regeneration

Regenerate dataset responses using your target model for improved drafter alignment.

Time required: ~10 minutes

Evaluating Model Performance

COMING SOON

Benchmark and evaluate your trained speculator models.