Tutorials

Step-by-step tutorials to guide you through complete workflows, from data preparation to serving trained models in production.

Serve in vLLM

Deploy your trained speculator models in vLLM for production inference.

Time required: ~5 minutes

Learn how to train an Eagle-3 speculator using online training, where hidden states are generated on-demand during training.

Time required: ~30 mins

Learn how to train an Eagle-3 speculator using offline training with pre-generated hidden states.

Time required: ~3 hours

COMING SOON

Learn how to train a DFlash speculator model with block-based token generation.

Regenerate dataset responses using your target model for improved drafter alignment.

Time required: ~10 minutes

COMING SOON

Benchmark and evaluate your trained speculator models.