vllm.v1.worker.gpu.spec_decode.utils ¶
Functions:
-
get_parallel_drafting_token_id–Resolve the mask token id used for parallel drafting slots.
get_parallel_drafting_token_id(hf_config) ¶
Resolve the mask token id used for parallel drafting slots.
Checks (in order): dflash_config.mask_token_id, pard_token, ptd_token_id. Raises ValueError if none are present.