Service Discovery

Service Discovery#

Batch Service#

class vllm_router.services.batch_service.batch.BatchStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)#

Bases: str, Enum

Represents the status of a batch job.

class vllm_router.services.batch_service.batch.BatchEndpoint(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)#

Bases: str, Enum

Represents the available OpenAI API endpoints for batch requests.

Ref https://platform.openai.com/docs/api-reference/batch/create#batch-create-endpoint.

class vllm_router.services.batch_service.batch.BatchRequest(input_file_id: str, endpoint: BatchEndpoint, completion_window: str, metadata: Dict[str, Any] | None = None)#: Represents a single request in a batch

class vllm_router.services.batch_service.batch.BatchInfo(id: str, status: BatchStatus, input_file_id: str, created_at: int, endpoint: str, completion_window: str, output_file_id: str | None = None, error_file_id: str | None = None, in_progress_at: int | None = None, expires_at: int | None = None, finalizing_at: int | None = None, completed_at: int | None = None, failed_at: int | None = None, expired_at: int | None = None, cancelling_at: int | None = None, cancelled_at: int | None = None, total_requests: int | None = None, completed_requests: int = 0, failed_requests: int = 0, metadata: Dict[str, Any] | None = None)#

Represents batch job information

Ref https://platform.openai.com/docs/api-reference/batch/object

to_dict() → Dict[str, Any]#: Convert the instance to a dictionary.

File Service#

class vllm_router.services.files_service.file_storage.FileStorage(base_path: str = '/tmp/vllm_files')#

Bases: Storage

File storage implementation using the local filesystem.

Files are stored in the following directory structure: /tmp/vllm_files/<user_id>/<file_id>

user_id is not used in the current implementation. It is reserved for future use.

async delete_file(file_id: str, user_id: str = 'uid_default')#: Delete file from disk

async get_file(file_id: str, user_id: str = 'uid_default') → OpenAIFile#: Retrieve file metadata from disk

async get_file_content(file_id: str, user_id: str = 'uid_default') → bytes#: Retrieve file content from disk

async list_files(user_id: str = 'uid_default') → List[str]#: List all files in storage

async save_file(file_id: str = None, user_id: str = 'uid_default', file_name: str = None, content: bytes = None, purpose: str = 'batch') → OpenAIFile#: Save file content to disk

async save_file_chunk(file_id: str, user_id: str = 'uid_default', chunk: bytes = None, purpose: str = 'batch', offset: int = 0) → None#: Save file chunk to disk at specified offset

Request Service#

async vllm_router.services.request_service.request.process_request(request: fastapi.Request, body, backend_url, request_id, endpoint, background_tasks: fastapi.BackgroundTasks, debug_request=None)#

Process a request by sending it to the chosen backend.

Parameters:

request (Request) – Request object.
body – The content of the request to send to the backend.
backend_url – The URL of the backend to send the request to.
request_id – A unique identifier for the request.
endpoint – The endpoint to send the request to on the backend.
debug_request – The original request object from the client, used for optional debug logging.

Yields:

The response headers and status code, followed by the response content.

Raises:

HTTPError – If the backend returns a 4xx or 5xx status code.

async vllm_router.services.request_service.request.route_general_request(request: fastapi.Request, endpoint: str, background_tasks: fastapi.BackgroundTasks)#

Route the incoming request to the backend server and stream the response back to the client.

This function extracts the requested model from the request body and retrieves the corresponding endpoints. It uses routing logic to determine the best server URL to handle the request, then streams the request to that server. If the requested model is not available, it returns an error response.

Parameters:

request (Request) – The incoming HTTP request.
endpoint (str) – The endpoint to which the request should be routed.

Returns:

A response object that streams data from the backend server to the client.

Return type:

StreamingResponse

Service Discovery

Contents

Service Discovery#

Batch Service#

File Service#

Request Service#