vllm_omni.attention.fish_kvcache_backend ¶
install_fish_kvcache_attn_backend ¶
Install the Fish kvcache fast path on this Fish SlowAR model only.
is_fish_kvcache_attn_active_for_model ¶
maybe_attach_fish_kvcache_seq_lens_upper_bound ¶
maybe_attach_fish_kvcache_seq_lens_upper_bound(
*,
model_config: Any,
attn_metadata: Any,
input_batch: Any,
optimistic_seq_lens_cpu: Tensor,
num_reqs: int,
num_reqs_padded: int,
max_query_len: int,
pad_attn: bool,
for_cudagraph_capture: bool = False,
num_scheduled_tokens_np: Any = None,
) -> None
Attach CPU seq-len upper bounds for the Fish-only Triton decode fastpath.