vllm_gaudi.attention.ops.hpu_paged_attn
¶
HPUPageAttentionInputBuilderBase
dataclass
¶
HPUPagedAttention
¶
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
copy_blocks
staticmethod
¶
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
get_kv_cache_shape
staticmethod
¶
get_supported_head_sizes
staticmethod
¶
split_kv_cache
staticmethod
¶
split_kv_cache(
kv_cache: tuple, num_kv_heads: int, head_size: int
) -> tuple[Tensor, Tensor, Tensor, Tensor]
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
supports_attn_type
classmethod
¶
CPU attention supports decoder and encoder-only attention.
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
swap_blocks
staticmethod
¶
swap_blocks(
src_kv_cache: tuple[Tensor, Tensor, Tensor, Tensor],
dst_kv_cache: tuple[Tensor, Tensor, Tensor, Tensor],
src_to_dsts: Tensor,
) -> None
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
write_to_paged_cache
staticmethod
¶
write_to_paged_cache(
key: Tensor,
value: Tensor,
key_cache: Tensor,
value_cache: Tensor,
slot_mapping: Tensor,
kv_cache_dtype: str,
is_prompt: bool,
) -> None
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
HPUPagedAttentionMetadata
dataclass
¶
Metadata for PagedAttention.
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
HPUPagedAttentionMetadataBuilder
dataclass
¶
Bases: AttentionMetadataBuilder
Source code in vllm_gaudi/attention/ops/hpu_paged_attn.py
__init__
¶
__init__(
input_builder: HPUPageAttentionInputBuilderBase,
) -> None
Create the builder, remember some configuration and parameters.