vllm_omni.diffusion.attention.parallel.ring ¶
RingParallelAttention ¶
Ring sequence-parallel strategy.
This strategy prepares inputs for Ring Attention. Key responsibilities: - Concatenate joint_query (Text) to query (Image) if present. - Keep joint_key/value separate in metadata for the Ring kernel to handle as static prefix.
post_attention ¶
post_attention(
attn_output: Tensor,
ctx: ParallelAttentionContext | None,
) -> Tensor
pre_attention ¶
pre_attention(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: AttentionMetadata | None,
)
run_attention ¶
run_attention(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: AttentionMetadata | None,
softmax_scale: float | None = None,
causal: bool = False,
) -> Tensor
Run the actual Ring Attention kernel.