vllm_omni.diffusion.attention.parallel.ulysses ¶
UlyssesParallelAttention ¶
Ulysses sequence-parallel strategy (all-to-all over seq/head dims).
This preserves the semantics previously implemented in Attention._forward_ulysses: - If AttentionMetadata.joint_* is provided, joint_query/key/value are concatenated after all-to-all. - joint_key/value are assumed to be replicated across SP ranks and are sliced by ulysses head rank before concatenation.
post_attention ¶
post_attention(
attn_output: Tensor,
ctx: ParallelAttentionContext | None,
) -> Tensor
pre_attention ¶
pre_attention(
query: Tensor,
key: Tensor,
value: Tensor,
attn_metadata: AttentionMetadata | None,
)