vllm_omni.distributed.omni_coordinator.load_balancer ¶
LeastQueueLengthBalancer ¶
Bases: LoadBalancer
Select the replica with the smallest queue_length.
If multiple replicas share the same minimum queue length, one of them is chosen uniformly at random.
Raises:
| Type | Description |
|---|---|
ValueError | If any replica has a negative |
LoadBalancer ¶
Bases: ABC
Abstract base class for load balancers.
Subclasses implement :meth:select to choose a replica for a given task.
select abstractmethod ¶
select(task: Task, replicas: list[ReplicaInfo]) -> int
Route a task to one of the available replicas.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task | Task | The task to route. Not used by the random policy but reserved for future strategies that may inspect task metadata. | required |
replicas | list[ReplicaInfo] | List of available replicas to choose from. | required |
Returns:
| Type | Description |
|---|---|
int | Index of the selected replica in |
Raises:
| Type | Description |
|---|---|
ValueError | If |
LoadBalancingPolicy ¶
Enumeration for load balancing policies.
These policies are used by :class:LoadBalancer implementations to route tasks to a subset of available replicas.
RandomBalancer ¶
RoundRobinBalancer ¶
Bases: LoadBalancer
Load balancer that selects replicas in a round-robin fashion.
This implementation keeps a running index modulo len(replicas). It therefore depends on the order and stable meaning of the replicas list between calls. If the list length or ordering changes, the sequence of picks may skip or repeat entries relative to a fixed set of backends.
Concurrency: a threading.Lock serializes updates to _next_index for callers that invoke select from multiple threads or alongside threaded infrastructure (e.g. ZMQ receive threads).