Skip to content

Algorithms

Speculators supports three speculative decoding algorithms. All are lossless -- they produce output from the same distribution as the target model.

Eagle-3

Predicts draft tokens autoregressively using Llama-style draft layers. The more established algorithm with mature support in both Speculators and vLLM.

P-EAGLE

Extends Eagle-3 with parallel multi-token prediction across multiple depths, using COD sampling for memory-efficient training.

DFlash

Predicts all draft tokens in a single forward pass using block-based prediction with Qwen3-style draft layers. Newer, with support improving rapidly.

MTP

Finetunes the model's native multi-token prediction head on domain-specific data. Available for models with built-in MTP support (e.g. Qwen3-Next, Qwen3.5).

Choosing an Algorithm

All algorithms can be paired with any supported verifier model. For help choosing between them, see the Decision Guide.

Adding New Algorithms

See the Developer Guide for instructions on adding custom algorithms to Speculators.