vllm_gaudi.ops.granite_causal_conv1d
¶
Granite 4.0 specific causal conv1d implementation.
This is a simplified conv1d implementation based on the v0.17.1 code, adapted for the v0.19.0 metadata interface (separate load/store cache indices). It processes one sequence at a time (padded_batch == 1) and supports prefix caching.
Used exclusively by hpu_mamba_mixer2.py (Granite 4.0). Other models continue to use causal_conv1d_pytorch.py.
granite_causal_conv1d_fn
¶
granite_causal_conv1d_fn(
x: Tensor,
weight: Tensor,
bias: Tensor | None,
conv_states: Tensor | None,
query_start_loc: Tensor,
enable_prefix_caching: bool = False,
load_cache_indices: Tensor | None = None,
store_cache_indices: Tensor | None = None,
blocks_caching_range: Tensor | None = None,
seqlens_offsets_for_blocks: Tensor | None = None,
has_initial_state: Tensor | None = None,
activation: str | None = "silu",
metadata=None,
validate_data: bool = False,
is_prompt: bool = True,
)
Source code in vllm_gaudi/ops/granite_causal_conv1d.py
granite_causal_conv1d_fn_update
¶
granite_causal_conv1d_fn_update(
x: Tensor,
weight: Tensor,
bias: Tensor | None,
conv_states: Tensor | None,
query_start_loc: Tensor,
load_cache_indices: Tensor | None = None,
store_cache_indices: Tensor | None = None,
has_initial_state: Tensor | None = None,
activation: str | None = "silu",
metadata=None,
validate_data: bool = False,
is_prompt: bool = True,
)
Source code in vllm_gaudi/ops/granite_causal_conv1d.py
granite_causal_conv1d_update
¶
granite_causal_conv1d_update(
x: Tensor,
conv_state: Tensor,
weight: Tensor,
bias: Tensor | None = None,
activation: bool | str | None = None,
load_cache_indices: Tensor | None = None,
store_cache_indices: Tensor | None = None,
query_start_loc: Tensor | None = None,
validate_data: bool = False,
)