Skip to content

vllm_omni.diffusion.attention.parallel.ulysses

UlyssesParallelAttention

Ulysses sequence-parallel strategy (all-to-all over seq/head dims).

This preserves the semantics previously implemented in Attention._forward_ulysses: - If AttentionMetadata.joint_* is provided, joint_query/key/value are concatenated after all-to-all. - joint_key/value are assumed to be replicated across SP ranks and are sliced by ulysses head rank before concatenation.

enabled property

enabled: bool

name property

name: str

post_attention

post_attention(
    attn_output: Tensor,
    ctx: ParallelAttentionContext | None,
) -> Tensor

pre_attention

pre_attention(
    query: Tensor,
    key: Tensor,
    value: Tensor,
    attn_metadata: AttentionMetadata | None,
)