vllm.v1.worker.gpu.model_states.interface ¶
Classes:
-
ModelSpecificAttnMetadata–Base class for model-specific attention metadata.
-
ModelState–
ModelSpecificAttnMetadata ¶
Base class for model-specific attention metadata.
Source code in vllm/v1/worker/gpu/model_states/interface.py
ModelState ¶
Bases: ABC
Methods:
-
custom_sampler–Wrap or replace the default sampler.
Attributes:
-
num_new_sampled_tokens_per_step(int) –New tokens sampled on each decode step
Source code in vllm/v1/worker/gpu/model_states/interface.py
num_new_sampled_tokens_per_step = 1 class-attribute instance-attribute ¶
New tokens sampled on each decode step (excluding accepted draft tokens, a.k.a num bonus tokens).
custom_sampler(sampler) ¶
Wrap or replace the default sampler.
Called after model loading with the already-constructed base Sampler. Return None to keep the defaults, or (sampler, rejection_sampler | None) to override.