Service Discovery#
Batch Service#
- class vllm_router.services.batch_service.batch.BatchStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)#
-
Represents the status of a batch job.
- class vllm_router.services.batch_service.batch.BatchEndpoint(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)#
-
Represents the available OpenAI API endpoints for batch requests.
Ref https://platform.openai.com/docs/api-reference/batch/create#batch-create-endpoint.
- class vllm_router.services.batch_service.batch.BatchRequest(input_file_id: str, endpoint: BatchEndpoint, completion_window: str, metadata: Dict[str, Any] | None = None)#
Represents a single request in a batch
- class vllm_router.services.batch_service.batch.BatchInfo(id: str, status: BatchStatus, input_file_id: str, created_at: int, endpoint: str, completion_window: str, output_file_id: str | None = None, error_file_id: str | None = None, in_progress_at: int | None = None, expires_at: int | None = None, finalizing_at: int | None = None, completed_at: int | None = None, failed_at: int | None = None, expired_at: int | None = None, cancelling_at: int | None = None, cancelled_at: int | None = None, total_requests: int | None = None, completed_requests: int = 0, failed_requests: int = 0, metadata: Dict[str, Any] | None = None)#
Represents batch job information
Ref https://platform.openai.com/docs/api-reference/batch/object
File Service#
- class vllm_router.services.files_service.file_storage.FileStorage(base_path: str = '/tmp/vllm_files')#
Bases:
StorageFile storage implementation using the local filesystem.
Files are stored in the following directory structure: /tmp/vllm_files/<user_id>/<file_id>
user_id is not used in the current implementation. It is reserved for future use.
- async get_file(file_id: str, user_id: str = 'uid_default') OpenAIFile#
Retrieve file metadata from disk
- async get_file_content(file_id: str, user_id: str = 'uid_default') bytes#
Retrieve file content from disk
Request Service#
- async vllm_router.services.request_service.request.process_request(request: fastapi.Request, body, backend_url, request_id, endpoint, background_tasks: fastapi.BackgroundTasks, debug_request=None)#
Process a request by sending it to the chosen backend.
- Parameters:
request (Request) – Request object.
body – The content of the request to send to the backend.
backend_url – The URL of the backend to send the request to.
request_id – A unique identifier for the request.
endpoint – The endpoint to send the request to on the backend.
debug_request – The original request object from the client, used for optional debug logging.
- Yields:
The response headers and status code, followed by the response content.
- Raises:
HTTPError – If the backend returns a 4xx or 5xx status code.
- async vllm_router.services.request_service.request.route_general_request(request: fastapi.Request, endpoint: str, background_tasks: fastapi.BackgroundTasks)#
Route the incoming request to the backend server and stream the response back to the client.
This function extracts the requested model from the request body and retrieves the corresponding endpoints. It uses routing logic to determine the best server URL to handle the request, then streams the request to that server. If the requested model is not available, it returns an error response.
- Parameters:
request (Request) – The incoming HTTP request.
endpoint (str) – The endpoint to which the request should be routed.
- Returns:
A response object that streams data from the backend server to the client.
- Return type:
StreamingResponse