vllm_omni.worker.base ¶

Base worker class for vLLM-Omni with process-scoped GPU memory accounting.

logger `module-attribute` ¶

logger = init_logger(__name__)

OmniGPUWorkerBase ¶

Bases: Worker

Base GPU worker for vLLM-Omni with process-scoped memory accounting.

This class overrides determine_available_memory() to use per-process GPU memory tracking via pynvml, allowing multiple stages to initialize concurrently on the same GPU without memory accounting interference.

It also replaces vLLM's TorchProfilerWrapper with OmniTorchProfilerWrapper for custom trace naming, background gzip, and trace path collection.

profiler `instance-attribute` ¶

profiler = create_omni_profiler(
    profiler_config=profiler_config,
    worker_name=worker_name,
    local_rank=self.local_rank,
)

determine_available_memory ¶

determine_available_memory() -> int

Process-scoped GPU memory profiling for concurrent stage initialization.

Algorithm

requested_memory = total_gpu_memory * gpu_memory_utilization (computed in init_device from cache_config)
process_memory = memory used by THIS process only (via pynvml)
Uses nvmlDeviceGetComputeRunningProcesses to get per-PID memory
Supports CUDA_VISIBLE_DEVICES with indices, UUIDs, or MIG IDs
available_kv_cache = requested_memory - process_memory

Fallback

If NVML is unavailable, falls back to profiling data: available = requested - (weights + activations + non_torch)

handle_sleep_task ¶

handle_sleep_task(task: OmniSleepTask) -> OmniACK

Handle deterministic Sleep command from the main process

handle_wake_task ¶

handle_wake_task(task: OmniWakeTask) -> OmniACK

Handle deterministic Wakeup command from the main process

load_model ¶

load_model(*args, **kwargs)

profile ¶

profile(
    is_start: bool = True, profile_prefix: str | None = None
)

Override to set trace filename before starting the profiler.

Parameters:

Name	Type	Description	Default
`is_start`	`bool`	True to start profiling, False to stop.	`True`
`profile_prefix`	`str \| None`	Optional prefix for trace filename (vLLM compat).	`None`

vLLM's profile() only passes is_start, so we generate a descriptive trace filename here before delegating to the profiler.

sleep ¶

sleep(level: int = 1) -> bool

Put the worker to sleep. Args: level: 1 (Offload weights to CPU), level: 2 (Total Discard).

wake_up ¶

wake_up(tags: list[str] | None = None) -> bool

Physical video memory reloading logic