vllm_omni.diffusion.attention.backends.utils.piecewise_attn ¶
Piecewise attention for mixed causal / full (bidirectional) masks.
Dispatches each segment as a separate attention call whose causal flag follows FlashAttention's bottom-right convention (K[:e] is attended by Q[s:e], with causal alignment anchored at the bottom-right corner).
Per segment
- causal segment
[s, e):attn(Q[:, s:e], K[:, :e], V[:, :e], causal=True) - full-attn span
[a, e):attn(Q[:, a:e], K[:, :e], V[:, :e], causal=False)