speculators
Speculators: A Unified Library for Speculative Decoding Algorithms for LLMs
Speculators provides a standardized framework for creating, representing, and storing speculative decoding algorithms for large language model (LLM) inference. It enables developers to implement and productize various speculative decoding approaches with a consistent interface, making them ready for integration with LLM inference servers like vLLM.
Speculative decoding is a technique that can significantly improve LLM inference performance by predicting multiple tokens with a smaller, speculative model and then verifying the predictions with the original, larger model. This approach tradesoff extra computation for reduced latency, making it suitable for real-time applications on deployments that are not compute-constrained.
The library offers a modular architecture with components for: - Standardized interfaces for working with speculative decoding algorithms that build on top of Transformers pathways for simple integration. - Centralized definition, configuration, and validation of speculative decoding algorithms.
Modules:
-
config–Configuration classes for Speculators library.
-
convert–Checkpoint conversion utilities for Speculators.
-
data_generation–Data generation utilities for EAGLE-style speculative decoding training.
-
model–Base model classes for the Speculators library.
-
models– -
proposals– -
train– -
utils–
Classes:
-
Eagle3DraftModel– -
Eagle3SpeculatorConfig–Configuration for EAGLE-3 speculator with vocabulary mapping.
-
SpeculatorModel–Abstract base class for all speculator models in the Speculators library.
-
SpeculatorModelConfig–The base config for a speculator model and implementation which defines the
-
SpeculatorsConfig–The base config for a spec decode implementation which defines the parameters
-
TokenProposalConfig–The base config for a token proposal method which defines how tokens are generated
-
VerifierConfig–The base config for a verifier model which defines the parameters that are required
Functions:
-
reload_schemas–Automatically populates the registry for all PydanticClassRegistryMixin subclasses
Eagle3DraftModel
Bases: DraftVocabMixin, SpeculatorModel
Methods:
-
from_training_args–Create Eagle3 model from training arguments.
-
get_trainer_kwargs–Get training and validation kwargs for Eagle3.
Source code in speculators/models/eagle3/core.py
from_training_args classmethod
from_training_args(
verifier_config: PretrainedConfig,
t2d: Tensor | None = None,
d2t: Tensor | None = None,
**kwargs,
) -> Eagle3DraftModel
Create Eagle3 model from training arguments.
Args: verifier_config: Verifier model configuration **kwargs: Training arguments with Eagle3-specific params - num_layers: Number of decoder layers - norm_before_residual: Whether to normalize before residual connection - t2d: Target-to-draft vocabulary mapping tensor - d2t: Draft-to-target vocabulary mapping tensor - ttt_steps: Number of TTT steps - verifier_name_or_path: Path to verifier model
Returns: Initialized Eagle3DraftModel
Source code in speculators/models/eagle3/core.py
get_trainer_kwargs staticmethod
Get training and validation kwargs for Eagle3.
Args: **kwargs: Training arguments
Returns: Tuple of (train_call_kwargs, val_call_kwargs)
Source code in speculators/models/eagle3/core.py
Eagle3SpeculatorConfig
Bases: SpeculatorModelConfig
Configuration for EAGLE-3 speculator with vocabulary mapping.
EAGLE-3 features vocabulary mapping between draft (32K) and target (128K) vocabularies, enabling cross-tokenizer speculation.
Parameters:
-
–transformer_layer_configConfiguration for the transformer decoder layer
-
–draft_vocab_sizeSize of draft model vocabulary for speculation
-
–norm_before_residualApply hidden_norm before storing residual
Methods:
-
serialize_transformer_config–Serialize transformer config to dict.
-
validate_transformer_config–Validate and convert transformer config.
Attributes:
-
target_vocab_size(int) –Get target vocabulary size from transformer config.
Source code in speculators/config.py
target_vocab_size property
Get target vocabulary size from transformer config.
serialize_transformer_config
validate_transformer_config classmethod
Validate and convert transformer config.
Source code in speculators/models/eagle3/config.py
SpeculatorModel
Bases: ClassRegistryMixin, PreTrainedModel
Abstract base class for all speculator models in the Speculators library.
This class provides the foundation for implementing speculative decoding models that can generate candidate tokens to be verified by a base verifier model. It combines the functionality of Hugging Face's PreTrainedModel and GenerationMixin with automatic model registration and discovery capabilities. All concrete speculator model implementations must inherit from this class, register with SpeculatorModel.register(NAME), and implement the abstract forward method.
Example:
# Load a speculator model with automatic class resolution
model = SpeculatorModel.from_pretrained("path/to/speculator")
Initialize a SpeculatorModel instance.
Parameters:
-
(configSpeculatorModelConfig) –The configuration for the speculator model. Must be a SpeculatorModelConfig instance containing model hyperparameters and speculative decoding settings.
-
–kwargsAdditional keyword arguments passed to the parent PreTrainedModel constructor.
Methods:
-
from_pretrained–Load a pretrained speculator model from the Hugging Face Hub or local directory.
-
from_training_args–Create model instance from training arguments.
-
get_trainer_kwargs–Get algorithm-specific kwargs for training and validation.
-
registered_model_class_from_config–Looks up the appropriate speculator model class from the registry
-
verify_training_compatible–Verify that a model instance is compatible with training infrastructure.
Source code in speculators/model.py
from_pretrained classmethod
from_pretrained(
pretrained_model_name_or_path: str | PathLike | None,
*model_args,
config: PretrainedConfig | str | PathLike | None = None,
cache_dir: str | PathLike | None = None,
ignore_mismatched_sizes: bool = False,
force_download: bool = False,
local_files_only: bool = False,
token: str | bool | None = None,
revision: str = "main",
use_safetensors: bool | None = None,
weights_only: bool = True,
t2d: Tensor | None = None,
d2t: Tensor | None = None,
**kwargs,
) -> SpeculatorModel
Load a pretrained speculator model from the Hugging Face Hub or local directory.
This method automatically resolves the correct speculator model class based on the configuration type and loads the model with the appropriate weights. If called on the base SpeculatorModel class, it will automatically determine and instantiate the correct subclass based on the model configuration.
Example:
# Load with automatic class resolution
model = SpeculatorModel.from_pretrained("RedHatAI/speculator-llama-7b")
# Load from local directory
model = SpeculatorModel.from_pretrained("./my_speculator")
# Load with custom config
config = SpeculatorModelConfig.from_pretrained("RedHatAI/eagle-llama-7b")
model = SpeculatorModel.from_pretrained(
None, config=config, state_dict=state_dict
)
Parameters:
-
(pretrained_model_name_or_pathstr | PathLike | None) –The model identifier on Hugging Face Hub, or path to a local directory containing the model files. Can be None if config is provided as a path.
-
–model_argsAdditional positional arguments passed to the model constructor.
-
(configPretrainedConfig | str | PathLike | None, default:None) –Optional configuration for the model. Can be a SpeculatorModelConfig instance, a path to a config file, or None to load from model directory.
-
(cache_dirstr | PathLike | None, default:None) –Directory to cache downloaded files. If None, uses default transformers cache directory.
-
(ignore_mismatched_sizesbool, default:False) –Whether to ignore size mismatches when loading pretrained weights. Useful for loading models with different architectures.
-
(force_downloadbool, default:False) –Whether to force re-download of model files even if they exist in cache.
-
(local_files_onlybool, default:False) –Whether to avoid downloading files and only use local cached files. Raises an error if files are not found locally.
-
(tokenstr | bool | None, default:None) –Optional authentication token for accessing private models on Hugging Face Hub. Can be a string token or True to use saved token.
-
(revisionstr, default:'main') –The specific model revision to load (branch name, tag, or commit hash). Defaults to "main".
-
(use_safetensorsbool | None, default:None) –Whether to use safetensors format for loading weights. If None, automatically detects the available format.
-
(weights_onlybool, default:True) –Whether to only load model weights without optimizer states or other training artifacts.
-
–kwargsAdditional keyword arguments passed to the model constructor and loading process.
Returns:
-
SpeculatorModel–A SpeculatorModel instance of the appropriate subclass, loaded with the pretrained weights and configuration.
Source code in speculators/model.py
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 | |
from_training_args abstractmethod classmethod
Create model instance from training arguments.
This factory method is used by the training script to instantiate models from command-line arguments. Each algorithm must implement this to support the training infrastructure.
Args: verifier_config: Configuration from the verifier/base model. **kwargs: Training arguments as keyword arguments. Each algorithm extracts the parameters it needs.
Returns: Initialized model instance ready for training.
Example:
@classmethod
def from_training_args(cls, verifier_config, **kwargs):
config = MySpeculatorConfig(
transformer_layer_config=verifier_config,
num_layers=kwargs['num_layers'],
...
)
return cls(config=config, t2d=kwargs.get('t2d'), d2t=kwargs.get('d2t'))
Source code in speculators/model.py
get_trainer_kwargs abstractmethod staticmethod
Get algorithm-specific kwargs for training and validation.
This method extracts algorithm-specific parameters from the training arguments and returns separate kwargs dictionaries for training and validation forward passes.
Args: **kwargs: Training arguments containing algorithm-specific parameters.
Returns: Tuple of (train_kwargs, val_kwargs) where: - train_kwargs: Dict passed to model.forward() during training - val_kwargs: Dict passed to model.forward() during validation
Example:
@staticmethod
def get_trainer_kwargs(**kwargs):
train_kwargs = {
"num_steps": kwargs["num_steps"],
"use_special_mode": True,
}
val_kwargs = {
"num_steps": kwargs["num_steps"],
"use_special_mode": False,
}
return train_kwargs, val_kwargs
Source code in speculators/model.py
registered_model_class_from_config classmethod
Looks up the appropriate speculator model class from the registry based on the configuration type. It matches the config class to the corresponding model class that was registered during auto-discovery or manual registration.
Parameters:
-
(configSpeculatorModelConfig) –The configuration for which to find the registered model class. Must be an instance of a SpeculatorModelConfig subclass.
Returns:
-
type[SpeculatorModel]–The registered model class that matches the configuration type.
Source code in speculators/model.py
verify_training_compatible classmethod
Verify that a model instance is compatible with training infrastructure.
This method validates that the given model is: 1. An instance of SpeculatorModel 2. Registered in the SpeculatorModel registry 3. Has a layers attribute (required for FSDP wrapping)
Args: model: The model instance to verify
Raises: TypeError: If model is not a SpeculatorModel instance ValueError: If model's class is not in the registry AttributeError: If model doesn't have a layers attribute
Source code in speculators/model.py
SpeculatorModelConfig
Bases: PydanticClassRegistryMixin, PretrainedConfig
The base config for a speculator model and implementation which defines the hyperparameters and settings required to implement a speculator model. It includes details on the speculator model architecture along with the speculators config describing the algorithm, token proposals, and verifier model.
It inherits from the Transformers PretrainedConfig class to ensure full compatibility with standard Transformers model pathways while building on the standard methods for PretrainedConfigs to load, save, and push to the HF hub.
This is the main config which maps to the config.json file for saved speculators.
Methods:
-
from_dict–Create a SpeculatorModelConfig from a dictionary, automatically instantiating
-
from_pretrained–Load a SpeculatorModelConfig from the name/id of a model on the Hugging Face Hub
-
to_dict–:return: A dictionary representation of the full config, including the
-
to_diff_dict–:return: A dictionary representation of a simplified config,
Source code in speculators/config.py
from_dict classmethod
Create a SpeculatorModelConfig from a dictionary, automatically instantiating the correct subclass based on the speculators_model_type field.
Parameters:
-
(config_dictdict[str, Any]) –Dictionary containing the configuration
-
–kwargsAdditional keyword arguments that override config values
Returns:
-
SpeculatorModelConfig–A SpeculatorModelConfig instance of the appropriate subclass
Source code in speculators/config.py
from_pretrained classmethod
from_pretrained(
pretrained_model_name_or_path: str | PathLike,
cache_dir: str | PathLike | None = None,
force_download: bool = False,
local_files_only: bool = False,
token: str | bool | None = None,
revision: str = "main",
**kwargs,
) -> SpeculatorModelConfig
Load a SpeculatorModelConfig from the name/id of a model on the Hugging Face Hub or from a local directory. Will automatically instantiate the correct config from speculators.models package.
Parameters:
-
(pretrained_model_name_or_pathstr | PathLike) –The name or path to the pretrained model.
-
(cache_dirstr | PathLike | None, default:None) –The directory to cache the config in.
-
(force_downloadbool, default:False) –Whether to force download the config from the Hub.
-
(local_files_onlybool, default:False) –Whether to use local files, not download from the Hub.
-
(tokenstr | bool | None, default:None) –The token to use for authentication with the Hub.
-
(revisionstr, default:'main') –The revision of the config to load from the Hub.
-
–kwargsAdditional keyword arguments to pass to the config.
Returns:
-
SpeculatorModelConfig–A SpeculatorModelConfig object with the loaded parameters.
Source code in speculators/config.py
to_dict
Returns:
-
dict[str, Any]–A dictionary representation of the full config, including the PretrainedConfig variables and Pydantic model fields.
Source code in speculators/config.py
to_diff_dict
Returns:
-
dict[str, Any]–A dictionary representation of a simplified config, including only the PretrainedConfig fields that have been modified or set, along with all Pydantic fields.
Source code in speculators/config.py
SpeculatorsConfig
Bases: ReloadableBaseModel
The base config for a spec decode implementation which defines the parameters required to implement a speculators algorithm for the parent, speculator model. It includes details on the algorithm, token proposals, and the verifier model.
Methods:
-
check_default_proposal_method–Validate default_proposal_method is one of the proposal_methods.
check_default_proposal_method
Validate default_proposal_method is one of the proposal_methods.
Source code in speculators/config.py
TokenProposalConfig
Bases: PydanticClassRegistryMixin
The base config for a token proposal method which defines how tokens are generated by the speculator, how they are passed to the verifier, and how they are scored for acceptance or rejection. All implementations of token proposal methods must inherit from this class, set the proposal_type to a unique value, and add any additional parameters needed to instantiate and implement the method.
It uses pydantic to validate the parameters, provide default values, and enable automatic serialization and deserialization of the correct class types based on the proposal_type field.
VerifierConfig
Bases: BaseModel
The base config for a verifier model which defines the parameters that are required to either load the original verifier model or validate compatibility with a new verifier based on the the architecture and tokenizers/processor properties. It provides convenience methods to extract the required parameters from a PretrainedConfig object.
Methods:
-
from_config–Create a VerifierConfig from a PretrainedConfig object.
from_config classmethod
Create a VerifierConfig from a PretrainedConfig object. Used to extract the required parameters from the original verifier config and create a VerifierConfig object.
Parameters:
-
(configPretrainedConfig) –The PretrainedConfig object to extract the parameters from.
-
(name_or_pathstr | None, default:'UNSET') –The name or path for the verifier model. Set to None to not add a specific name_or_path. If not provided, the name_or_path from the config will be used.
Returns:
-
VerifierConfig–A VerifierConfig object with the extracted parameters.
Source code in speculators/config.py
reload_schemas
Automatically populates the registry for all PydanticClassRegistryMixin subclasses and reloads schemas for all Config classes to ensure their schemas are up-to-date with the current registry state.