Router Logic#

class vllm_router.routers.routing_logic.SessionRouter(*args, **kwargs)#

Bases: RoutingInterface

Route the request to the appropriate engine URL based on the session key in the request headers

route_request(endpoints: List[EndpointInfo], engine_stats: Dict[str, EngineStats], request_stats: Dict[str, RequestStats], request: fastapi.Request) str#

Route the request to the appropriate engine URL by the ‘session id’ in the request headers. If there is no session id in the request header, it will pick a server with lowest qps

Parameters:
  • endpoints (List[EndpointInfo]) – The list of engine URLs

  • engine_stats (Dict[str, EngineStats]) – The engine stats indicating the ‘physical’ load of each engine

  • request_stats (Dict[str, RequestStats]) – The request stats indicating the request-level performance of each engine

  • request (Request) – The incoming request

class vllm_router.routers.routing_logic.RoundRobinRouter(*args, **kwargs)#

Bases: RoutingInterface

route_request(endpoints: List[EndpointInfo], engine_stats: Dict[str, EngineStats], request_stats: Dict[str, RequestStats], request: fastapi.Request) str#

Route the request to the appropriate engine URL using a simple round-robin algorithm

Parameters:
  • endpoints (List[EndpointInfo]) – The list of engine URLs

  • engine_stats (Dict[str, EngineStats]) – The engine stats indicating the ‘physical’ load of each engine

  • request_stats (Dict[str, RequestStats]) – The request stats indicating the request-level performance of each engine

  • request (Request) – The incoming request

class vllm_router.routers.routing_logic.KvawareRouter(*args, **kwargs)#

Bases: RoutingInterface

Route the request to the appropriate engine URL by where the KV cache of the longest prefix match is found.

query_manager(msg) str#

Get the instance id for the given message

async route_request(endpoints: List[EndpointInfo], engine_stats: Dict[str, EngineStats], request_stats: Dict[str, RequestStats], request: fastapi.Request, request_json: Dict) str#

Route the request to the appropriate engine URL by where the KV cache of the longest prefix match is found. If there is no session id in the request header, it will pick a server with round robin.

Parameters:
  • endpoints (List[EndpointInfo]) – The list of engine URLs

  • engine_stats (Dict[str, EngineStats]) – The engine stats indicating the ‘physical’ load of each engine

  • request_stats (Dict[str, RequestStats]) – The request stats indicating the request-level performance of each engine

  • request (Request) – The incoming request

  • request_json (Dict) – The request body (needed for finding the

  • match) (longest prefix)

start_kv_manager()#

Start the kv manager