vllm_omni.diffusion.attention.backends.ring.ring_selector ¶
select_flash_attn_impl ¶
select_flash_attn_impl(
impl_type: AttnType,
stage: str = "fwd-only",
attn_processor: Module | None = None,
) -> Callable[..., tuple[Tensor, Tensor | None]]
Select attention implementation for forward pass (inference only).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
impl_type | AttnType | The attention implementation type. | required |
stage | str | Must be "fwd-only" (backward not supported for inference). | 'fwd-only' |
attn_processor | Module | None | Optional custom attention processor. | None |
Returns:
| Type | Description |
|---|---|
Callable[..., tuple[Tensor, Tensor | None]] | Callable[..., tuple[torch.Tensor, torch.Tensor | None]]: The attention forward function for the specified implementation. |