speculators.data_generation.vllm_client
Functions:
-
generate_hidden_states–Runs decode w/ max_tokens 1 to generate hidden states and returns path to
-
generate_hidden_states_async–Runs decode w/ max_tokens 1 to generate hidden states and returns path to
-
with_retries–Decorator that adds retry logic with exponential backoff.
generate_hidden_states
generate_hidden_states(
client: Client,
model: str,
token_ids: list[int],
timeout: float | None = DEFAULT_REQUEST_TIMEOUT,
) -> str
Runs decode w/ max_tokens 1 to generate hidden states and returns path to hidden states file.
Source code in speculators/data_generation/vllm_client.py
generate_hidden_states_async async
generate_hidden_states_async(
client: AsyncClient,
model: str,
token_ids: list[int],
timeout: float | None = DEFAULT_REQUEST_TIMEOUT,
) -> str
Runs decode w/ max_tokens 1 to generate hidden states and returns path to hidden states file.
Args: client: The async OpenAI client. model: The model ID. token_ids: The input token IDs. timeout: Timeout in seconds for each request attempt. None for no timeout.
Source code in speculators/data_generation/vllm_client.py
with_retries
Decorator that adds retry logic with exponential backoff.
The decorated function gains a max_retries keyword argument (default DEFAULT_MAX_RETRIES). InvalidResponseError is never retried. Works for both sync and async functions.