Skip to content

vllm_omni.distributed.omni_coordinator.load_balancer

LeastQueueLengthBalancer

Bases: LoadBalancer

Select the replica with the smallest queue_length.

If multiple replicas share the same minimum queue length, one of them is chosen uniformly at random.

Raises:

Type Description
ValueError

If any replica has a negative queue_length.

select

select(task: Task, replicas: list[ReplicaInfo]) -> int

LoadBalancer

Bases: ABC

Abstract base class for load balancers.

Subclasses implement :meth:select to choose a replica for a given task.

select abstractmethod

select(task: Task, replicas: list[ReplicaInfo]) -> int

Route a task to one of the available replicas.

Parameters:

Name Type Description Default
task Task

The task to route. Not used by the random policy but reserved for future strategies that may inspect task metadata.

required
replicas list[ReplicaInfo]

List of available replicas to choose from.

required

Returns:

Type Description
int

Index of the selected replica in replicas.

Raises:

Type Description
ValueError

If replicas is empty.

LoadBalancingPolicy

Bases: str, Enum

Enumeration for load balancing policies.

These policies are used by :class:LoadBalancer implementations to route tasks to a subset of available replicas.

LEAST_QUEUE_LENGTH class-attribute instance-attribute

LEAST_QUEUE_LENGTH = 'least-queue-length'

RANDOM class-attribute instance-attribute

RANDOM = 'random'

ROUND_ROBIN class-attribute instance-attribute

ROUND_ROBIN = 'round-robin'

RandomBalancer

Bases: LoadBalancer

Load balancer that selects a replica uniformly at random.

select

select(task: Task, replicas: list[ReplicaInfo]) -> int

RoundRobinBalancer

Bases: LoadBalancer

Load balancer that selects replicas in a round-robin fashion.

This implementation keeps a running index modulo len(replicas). It therefore depends on the order and stable meaning of the replicas list between calls. If the list length or ordering changes, the sequence of picks may skip or repeat entries relative to a fixed set of backends.

Concurrency: a threading.Lock serializes updates to _next_index for callers that invoke select from multiple threads or alongside threaded infrastructure (e.g. ZMQ receive threads).

select

select(task: Task, replicas: list[ReplicaInfo]) -> int

Task

Bases: TypedDict

Task structure passed to StagePool.pick / LoadBalancer.select.

Mirrors the dict built around a stage submission with request_id and any payload-related fields a future load-balancing policy might inspect.

engine_inputs instance-attribute

engine_inputs: Any

request_id instance-attribute

request_id: str

sampling_params instance-attribute

sampling_params: Any