Skip to content

vllm_omni.diffusion.attention.parallel.ring ¶

logger `module-attribute` ¶

logger = init_logger(__name__)

RingParallelAttention ¶

Ring sequence-parallel strategy.

This strategy prepares inputs for Ring Attention. Key responsibilities: - Concatenate joint_query (Text) to query (Image) if present. - Keep joint_key/value separate in metadata for the Ring kernel to handle as static prefix.

attn_backend_pref `instance-attribute` ¶

attn_backend_pref = attn_backend_pref

enabled `property` ¶

enabled: bool

name `property` ¶

name: str

post_attention ¶

post_attention(
    attn_output: Tensor,
    ctx: ParallelAttentionContext | None,
) -> Tensor

pre_attention ¶

pre_attention(
    query: Tensor,
    key: Tensor,
    value: Tensor,
    attn_metadata: AttentionMetadata | None,
)

run_attention ¶

run_attention(
    query: Tensor,
    key: Tensor,
    value: Tensor,
    attn_metadata: AttentionMetadata | None,
    softmax_scale: float | None = None,
    causal: bool = False,
) -> Tensor

Run the actual Ring Attention kernel.