Algorithm Decision Guide
Speculators currently supports two speculative decoding algorithms: Eagle-3 and DFlash. Both are lossless -- they produce output from the same distribution as the target model.
How They Differ
Eagle-3 predicts draft tokens autoregressively, one at a time.
DFlash predicts all draft tokens in a single forward pass using block-based prediction with anchor points.
Both algorithms can be paired with any supported verifier model (including quantized variants) -- the draft architecture is independent of the verifier architecture. The draft layers are always trained from scratch, so the choice of draft architecture doesn't constrain which target models you can accelerate.
Current Support
| Eagle-3 | DFlash | |
|---|---|---|
| Draft layers | Llama-style | Qwen3-style |
| Verifier models | Any supported | Any supported |
| Speculators | Mature | Newer, growing fast |
| vLLM | Mature | Newer, growing fast |
Eagle-3 has been available longer and has broader support in both Speculators and vLLM. DFlash was added more recently and support is improving rapidly.
Which Should I Use?
If you're unsure, start with Eagle-3 -- it has the most mature tooling and documentation. If you want to experiment with DFlash's single-forward-pass approach, the training workflow is the same.
For more details on each algorithm, see: