Skip to content

vllm.v1.worker.gpu.spec_decode.dflash.utils

Functions:

  • get_dflash_causal

    Whether the DFlash draft uses causal (vs non-causal) attention.

get_dflash_causal(draft_model_config)

Whether the DFlash draft uses causal (vs non-causal) attention.

Source code in vllm/v1/worker/gpu/spec_decode/dflash/utils.py
def get_dflash_causal(draft_model_config: ModelConfig) -> bool:
    """Whether the DFlash draft uses causal (vs non-causal) attention."""
    dflash_config = getattr(draft_model_config.hf_config, "dflash_config", None) or {}
    return dflash_config.get("causal", False)