vllm_gaudi.v1.attention.backends.hpu_attn
¶
HPUAttentionBackendV1
¶
Bases: HPUAttentionBackend
Source code in vllm_gaudi/v1/attention/backends/hpu_attn.py
HPUAttentionMetadataV1
dataclass
¶
Bases: HPUAttentionMetadata
Metadata for HPUAttentionbackend.
Source code in vllm_gaudi/v1/attention/backends/hpu_attn.py
__init__
¶
__init__(
block_list: Optional[Tensor],
block_mapping: Optional[Tensor],
block_usage: Optional[Tensor],
block_groups: Optional[Tensor],
alibi_blocks: Optional[Tensor],
is_prompt: bool,
block_size: int,
slot_mapping: Tensor,
attn_bias: Optional[Tensor],
seq_lens_tensor: Optional[Tensor],
context_lens_tensor: Optional[Tensor],
input_positions: Tensor,
seq_lens: Optional[list[int]] = None,
encoder_seq_lens: Optional[list[int]] = None,
encoder_seq_lens_tensor: Optional[Tensor] = None,
max_encoder_seq_len: Optional[int] = None,
cross_block_list: Optional[Tensor] = None,
cross_slot_mapping: Optional[Tensor] = None,
cross_block_mapping: Optional[Tensor] = None,
cross_block_groups: Optional[Tensor] = None,
cross_block_usage: Optional[Tensor] = None,
cross_attn_bias: Optional[Tensor] = None,
window_block_list: Optional[Tensor] = None,
window_slot_mapping: Optional[Tensor] = None,
window_block_mapping: Optional[Tensor] = None,
window_block_groups: Optional[Tensor] = None,
window_block_usage: Optional[Tensor] = None,
window_attn_bias: Optional[Tensor] = None,
chunked_slot_mapping: Optional[Tensor] = None,
chunked_attn_bias: Optional[Tensor] = None,
chunked_block_mapping: Optional[Tensor] = None,
chunked_block_list: Optional[Tensor] = None,
chunked_block_groups: Optional[Tensor] = None,
chunked_block_usage: Optional[Tensor] = None,
query_start_loc: Optional[Tensor] = None,
) -> None
make_decode_metadata
classmethod
¶
make_decode_metadata(
block_list,
block_usage,
block_groups,
input_positions,
slot_mapping,
block_size,
window_block_list,
window_block_usage,
window_block_groups,
chunked_block_list,
chunked_block_usage,
chunked_block_groups,
query_start_loc=None,
)
Source code in vllm_gaudi/v1/attention/backends/hpu_attn.py
make_prefill_metadata
classmethod
¶
make_prefill_metadata(
attn_bias,
block_list,
context_lens_tensor,
seq_lens_tensor,
slot_mapping,
block_size,
query_start_loc=None,
)