vllm_omni.diffusion.attention.backends.abstract ¶
AttentionBackend ¶
AttentionImpl ¶
forward ¶
forward(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: T | None = None,
) -> Tensor
Dispatch to platform-specific forward implementation.
forward_cuda ¶
forward_cuda(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: T | None = None,
) -> Tensor
forward_hip ¶
forward_hip(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: T | None = None,
) -> Tensor
forward_musa ¶
forward_musa(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: T | None = None,
) -> Tensor
forward_npu ¶
forward_npu(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: T | None = None,
) -> Tensor