Skip to content

vllm_omni.worker.base

Base worker class for vLLM-Omni with process-scoped GPU memory accounting.

logger module-attribute

logger = init_logger(__name__)

OmniGPUWorkerBase

Bases: Worker

Base GPU worker for vLLM-Omni with process-scoped memory accounting.

This class overrides determine_available_memory() to use per-process GPU memory tracking via pynvml, allowing multiple stages to initialize concurrently on the same GPU without memory accounting interference.

It also replaces vLLM's TorchProfilerWrapper with OmniTorchProfilerWrapper for custom trace naming, background gzip, and trace path collection.

profiler instance-attribute

profiler = create_omni_profiler(
    profiler_config=profiler_config,
    worker_name=worker_name,
    local_rank=local_rank,
)

determine_available_memory

determine_available_memory() -> int

Process-scoped GPU memory profiling for concurrent stage initialization.

Algorithm
  1. requested_memory = total_gpu_memory * gpu_memory_utilization (computed in init_device from cache_config)

  2. process_memory = memory used by THIS process only (via pynvml)

  3. Uses nvmlDeviceGetComputeRunningProcesses to get per-PID memory
  4. Supports CUDA_VISIBLE_DEVICES with indices, UUIDs, or MIG IDs

  5. available_kv_cache = requested_memory - process_memory

Fallback

If NVML is unavailable, falls back to profiling data: available = requested - (weights + activations + non_torch)

handle_sleep_task

handle_sleep_task(task: OmniSleepTask) -> OmniACK

Handle deterministic Sleep command from the main process

handle_wake_task

handle_wake_task(task: OmniWakeTask) -> OmniACK

Handle deterministic Wakeup command from the main process

load_model

load_model(*args, **kwargs)

profile

profile(
    is_start: bool = True, profile_prefix: str | None = None
)

Override to set trace filename before starting the profiler.

Parameters:

Name Type Description Default
is_start bool

True to start profiling, False to stop.

True
profile_prefix str | None

Optional prefix for trace filename (vLLM compat).

None

vLLM's profile() only passes is_start, so we generate a descriptive trace filename here before delegating to the profiler.

sleep

sleep(level: int = 1) -> bool

Put the worker to sleep. Args: level: 1 (Offload weights to CPU), level: 2 (Total Discard).

wake_up

wake_up(tags: list[str] | None = None) -> bool

Physical video memory reloading logic