Skip to content

vllm_omni.diffusion.attention.parallel.ulysses ¶

UlyssesParallelAttention ¶

Ulysses sequence-parallel strategy (all-to-all over seq/head dims).

This preserves the semantics previously implemented in Attention._forward_ulysses: - If AttentionMetadata.joint_* is provided, joint_query/key/value are concatenated after all-to-all. - joint_key/value are assumed to be replicated across SP ranks and are sliced by ulysses head rank before concatenation.

enabled `property` ¶

enabled: bool

name `property` ¶

name: str

post_attention ¶

post_attention(
    attn_output: Tensor,
    ctx: ParallelAttentionContext | None,
) -> Tensor

pre_attention ¶

pre_attention(
    query: Tensor,
    key: Tensor,
    value: Tensor,
    attn_metadata: AttentionMetadata | None,
)