Router Logic#
- class vllm_router.routers.routing_logic.SessionRouter(*args, **kwargs)#
Bases:
RoutingInterfaceRoute the request to the appropriate engine URL based on the session key in the request headers
- route_request(endpoints: List[EndpointInfo], engine_stats: Dict[str, EngineStats], request_stats: Dict[str, RequestStats], request: fastapi.Request) str#
Route the request to the appropriate engine URL by the ‘session id’ in the request headers. If there is no session id in the request header, it will pick a server with lowest qps
- Parameters:
endpoints (List[EndpointInfo]) – The list of engine URLs
engine_stats (Dict[str, EngineStats]) – The engine stats indicating the ‘physical’ load of each engine
request_stats (Dict[str, RequestStats]) – The request stats indicating the request-level performance of each engine
request (Request) – The incoming request
- class vllm_router.routers.routing_logic.RoundRobinRouter(*args, **kwargs)#
Bases:
RoutingInterface- route_request(endpoints: List[EndpointInfo], engine_stats: Dict[str, EngineStats], request_stats: Dict[str, RequestStats], request: fastapi.Request) str#
Route the request to the appropriate engine URL using a simple round-robin algorithm
- Parameters:
endpoints (List[EndpointInfo]) – The list of engine URLs
engine_stats (Dict[str, EngineStats]) – The engine stats indicating the ‘physical’ load of each engine
request_stats (Dict[str, RequestStats]) – The request stats indicating the request-level performance of each engine
request (Request) – The incoming request
- class vllm_router.routers.routing_logic.KvawareRouter(*args, **kwargs)#
Bases:
RoutingInterfaceRoute the request to the appropriate engine URL by where the KV cache of the longest prefix match is found.
- async route_request(endpoints: List[EndpointInfo], engine_stats: Dict[str, EngineStats], request_stats: Dict[str, RequestStats], request: fastapi.Request, request_json: Dict) str#
Route the request to the appropriate engine URL by where the KV cache of the longest prefix match is found. If there is no session id in the request header, it will pick a server with round robin.
- Parameters:
endpoints (List[EndpointInfo]) – The list of engine URLs
engine_stats (Dict[str, EngineStats]) – The engine stats indicating the ‘physical’ load of each engine
request_stats (Dict[str, RequestStats]) – The request stats indicating the request-level performance of each engine
request (Request) – The incoming request
request_json (Dict) – The request body (needed for finding the
match) (longest prefix)
- start_kv_manager()#
Start the kv manager