Skip to content

vllm.v1.attention.backends.utils

CommonAttentionMetadata dataclass

Attention metadata attributes that can be shared by layers in different KV cache groups and thus having different block table.

Source code in vllm/v1/attention/backends/utils.py
@dataclass
class CommonAttentionMetadata:
    """
    Attention metadata attributes that can be shared by layers in different KV
    cache groups and thus having different block table.
    """

    query_start_loc: torch.Tensor
    """(batch_size + 1,), the start location of each request in query Tensor"""
    seq_lens: torch.Tensor
    """(batch_size,), the length of each request including both computed tokens
    and newly scheduled tokens"""

query_start_loc instance-attribute

query_start_loc: Tensor

(batch_size + 1,), the start location of each request in query Tensor

seq_lens instance-attribute

seq_lens: Tensor

(batch_size,), the length of each request including both computed tokens and newly scheduled tokens

__init__

__init__(query_start_loc: Tensor, seq_lens: Tensor) -> None