Skip to content

Algorithms

Speculators supports two speculative decoding algorithms. Both are lossless -- they produce output from the same distribution as the target model.

Eagle-3

Predicts draft tokens autoregressively using Llama-style draft layers. The more established algorithm with mature support in both Speculators and vLLM.

DFlash

Predicts all draft tokens in a single forward pass using block-based prediction with Qwen3-style draft layers. Newer, with support improving rapidly.

Choosing an Algorithm

Both algorithms can be paired with any supported verifier model. For help choosing between them, see the Decision Guide.

Adding New Algorithms

See the Developer Guide for instructions on adding custom algorithms to Speculators.