P-EAGLE

P-EAGLE (Parallel EAGLE) extends Eagle-3 with parallel multi-token prediction. Instead of drafting tokens autoregressively one at a time, P-EAGLE predicts multiple tokens in parallel. It uses Conditional Drop-token (COD) sampling during training for memory efficiency, and Llama-style transformer layers inherited from Eagle-3. It can be paired with any supported verifier model.

How It Works

Architecture

P-EAGLE builds on the Eagle-3 architecture: the target model produces hidden states at selected layers, which are concatenated, projected, and passed through Llama-style decoder layers. The key difference is that P-EAGLE adds multiple prediction depths -- at each position, the model predicts not just the next token but several tokens ahead in parallel. Each depth level d makes predictions for position (anchor + d) in the sequence.

COD Sampling

Training a parallel multi-depth model naively would require memory proportional to num_depths × sequence_length. P-EAGLE uses Conditional Drop-token (COD) sampling to reduce this cost:

Depth 0 retains all n positions
Depth d retains approximately n × r^d positions, where r is the down-sample-ratio
A minimum retention floor (down-sample-ratio-min) prevents over-sampling at deep levels

This geometric decay means deeper predictions train on fewer positions per batch, keeping memory usage manageable while still learning to predict multiple tokens ahead.

Inference Process

P-EAGLE drafts multiple tokens in parallel across all depths in a single pass
Target model verifies all draft tokens in one forward pass
The longest correct prefix is accepted
Repeat from the last accepted token

Key Parameters

Parameter	Default	Description
`--num-layers`	4	Number of draft transformer layers
`--num-depths`	4	Number of parallel prediction depths
`--down-sample-ratio`	0.7	Geometric decay ratio for COD sampling
`--down-sample-ratio-min`	0.2	Minimum retention floor for COD sampling
`--no-norm-before-residual`	—	Disable normalization before residual connections

Pretrained Models

There are currently no pretrained P-EAGLE models available. You can train your own using the tutorials linked below.

Research & Citation

P-EAGLE is based on research from AWS AI Labs: arXiv Paper

@article{hui2026peagle,
  title={P-EAGLE: Parallel-Drafting EAGLE with Scalable Training},
  author={Hui, Mude and Huang, Xin and Salas, Jaime Campos and Sun, Yue and Pemberton, Nathan and Song, Xiang and Khetan, Ashish and Karypis, George},
  journal={arXiv preprint arXiv:2602.01469},
  year={2026}
}