vllm_omni.diffusion.attention.backends.flash_attn ¶
FlashAttentionBackend ¶
Bases: AttentionBackend
FlashAttentionImpl ¶
Bases: AttentionImpl
forward_cuda ¶
forward_cuda(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: AttentionMetadata = None,
) -> Tensor
CUDA/ROCm/MUSA flash attention implementation.
forward_fa_npu ¶
forward_fa_npu(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: AttentionMetadata = None,
) -> Tensor
forward_fa_quant_npu ¶
forward_fa_quant_npu(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: AttentionMetadata = None,
) -> Tensor
forward_npu ¶
forward_npu(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: AttentionMetadata = None,
) -> Tensor
NPU attention implementation using mindiesd.
forward_xpu ¶
forward_xpu(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: AttentionMetadata = None,
) -> Tensor
XPU flash attention implementation.