Skip to content

vllm.config.kv_events

Classes:

KVEventsConfig

Configuration for KV event publishing.

Attributes:

  • buffer_steps (int) –

    The number of steps to cache for replay endpoint. Will only save

  • enable_kv_cache_events (bool) –

    If True, enable KV cache events for tracking block storage and removal.

  • endpoint (str) –

    The zmq endpoint to use for publishing kv events.

  • hwm (int) –

    The zmq high water mark for the event publisher. After queueing N events,

  • max_queue_size (int) –

    The maximum number of events to queue while waiting for publishing.

  • publisher (Literal['null', 'zmq']) –

    The publisher to use for publishing kv events. Can be "null", "zmq".

  • replay_endpoint (str | None) –

    The zmq endpoint to use for replaying kv events.

  • topic (str) –

    The topic to use for the event publisher. Consumers can subscribe to

Source code in vllm/config/kv_events.py
@config
class KVEventsConfig:
    """Configuration for KV event publishing."""

    enable_kv_cache_events: bool = False
    """If True, enable KV cache events for tracking block storage and removal.
    Events can be published externally by zmq using the event publisher config.
    """

    publisher: Literal["null", "zmq"] = None  # type: ignore[assignment]
    """The publisher to use for publishing kv events. Can be "null", "zmq".
    """

    endpoint: str = "tcp://*:5557"
    """The zmq endpoint to use for publishing kv events.
    """

    replay_endpoint: str | None = None
    """The zmq endpoint to use for replaying kv events.
    """

    buffer_steps: int = 10_000
    """The number of steps to cache for replay endpoint. Will only save
    events from the last N steps for the replay endpoint.
    """

    hwm: int = 100_000
    """The zmq high water mark for the event publisher. After queueing N events,
    events will start dropping if the consumer is not keeping up.
    """

    max_queue_size: int = 100_000
    """The maximum number of events to queue while waiting for publishing.
    """

    topic: str = ""
    """The topic to use for the event publisher. Consumers can subscribe to
    this topic to receive events.
    """

    def __post_init__(self):
        if self.publisher is None:
            self.publisher = "zmq" if self.enable_kv_cache_events else "null"

buffer_steps = 10000 class-attribute instance-attribute

The number of steps to cache for replay endpoint. Will only save events from the last N steps for the replay endpoint.

enable_kv_cache_events = False class-attribute instance-attribute

If True, enable KV cache events for tracking block storage and removal. Events can be published externally by zmq using the event publisher config.

endpoint = 'tcp://*:5557' class-attribute instance-attribute

The zmq endpoint to use for publishing kv events.

hwm = 100000 class-attribute instance-attribute

The zmq high water mark for the event publisher. After queueing N events, events will start dropping if the consumer is not keeping up.

max_queue_size = 100000 class-attribute instance-attribute

The maximum number of events to queue while waiting for publishing.

publisher = None class-attribute instance-attribute

The publisher to use for publishing kv events. Can be "null", "zmq".

replay_endpoint = None class-attribute instance-attribute

The zmq endpoint to use for replaying kv events.

topic = '' class-attribute instance-attribute

The topic to use for the event publisher. Consumers can subscribe to this topic to receive events.