speculators.train.vocab_mapping
Vocabulary mapping utilities for draft model training.
Functions:
-
build_vocab_mappings_from_distribution–Build vocabulary mappings for draft model from token frequency distribution.
-
save_token_frequency_distribution–Save token frequency distribution from the dataset.
build_vocab_mappings_from_distribution
build_vocab_mappings_from_distribution(
token_freq_dict: dict[int, int],
draft_vocab_size: int,
target_vocab_size: int,
) -> tuple[torch.Tensor, torch.Tensor]
Build vocabulary mappings for draft model from token frequency distribution.
Source code in speculators/train/vocab_mapping.py
combine_token_frequency_distributions
combine_token_frequency_distributions(
token_freq_paths: list[str | Path],
output_path: str | Path,
)
Combine multiple token frequency distributions into a single file.
Source code in speculators/train/vocab_mapping.py
save_token_frequency_distribution
Save token frequency distribution from the dataset.
Args: dataset: HuggingFace dataset with input_ids and loss_mask output_path: Path where to save the token frequency distribution
Returns: Path to the saved frequency distribution file