Skip to content

vLLM

vllm.v1.attention.backends.utils

vllm.v1.attention.backends.utils

CommonAttentionMetadata `dataclass` ¶

Attention metadata attributes that can be shared by layers in different KV cache groups and thus having different block table.

Source code in vllm/v1/attention/backends/utils.py

@dataclass
class CommonAttentionMetadata:
    """
    Attention metadata attributes that can be shared by layers in different KV
    cache groups and thus having different block table.
    """

    query_start_loc: torch.Tensor
    """(batch_size + 1,), the start location of each request in query Tensor"""
    seq_lens: torch.Tensor
    """(batch_size,), the length of each request including both computed tokens
    and newly scheduled tokens"""

query_start_loc `instance-attribute` ¶

query_start_loc: Tensor

(batch_size + 1,), the start location of each request in query Tensor

seq_lens `instance-attribute` ¶

seq_lens: Tensor

(batch_size,), the length of each request including both computed tokens and newly scheduled tokens

init ¶

__init__(query_start_loc: Tensor, seq_lens: Tensor) -> None