vllm.v1.worker.gpu.spec_decode.dflash.utils ¶
Functions:
-
get_dflash_causal–Whether the DFlash draft uses causal (vs non-causal) attention.
get_dflash_causal(draft_model_config) ¶
Whether the DFlash draft uses causal (vs non-causal) attention.