Request Stats#
- class vllm_router.stats.request_stats.RequestStats(qps: float, ttft: float, in_prefill_requests: int, in_decoding_requests: int, finished_requests: int, uptime: int, avg_decoding_length: float, avg_latency: float, avg_itl: float, num_swapped_requests: int)#
- class vllm_router.stats.request_stats.RequestStatsMonitor(*args, **kwargs)#
Monitors the request statistics of all serving engines.
- get_request_stats(current_time: float) Dict[str, RequestStats]#
Get the request statistics for each serving engine
- Parameters:
current_time – The current timestamp in seconds
- Returns:
A dictionary where the key is the serving engine URL and the value is the request statistics for that engine. The TTFT and inter token latency will be -1 if there is no requests finished in the sliding window.
- on_new_request(engine_url: str, request_id: str, timestamp: float)#
Tell the monitor that a new request has been created.
- Parameters:
engine_url – The URL of the serving engine
request_id – The global request ID
timestamp – the timestamp when the request was created
- on_request_complete(engine_url: str, request_id: str, timestamp: float)#
Tell the monitor that a request has been completed.
- Parameters:
engine_url – The URL of the serving engine
request_id – The global request ID
timestamp – The timestamp when the request was completed