vllm_omni.worker.base ¶
Base worker class for vLLM-Omni with process-scoped GPU memory accounting.
OmniGPUWorkerBase ¶
Bases: Worker
Base GPU worker for vLLM-Omni with process-scoped memory accounting.
This class overrides determine_available_memory() to use per-process GPU memory tracking via pynvml, allowing multiple stages to initialize concurrently on the same GPU without memory accounting interference.
It also replaces vLLM's TorchProfilerWrapper with OmniTorchProfilerWrapper for custom trace naming, background gzip, and trace path collection.
profiler instance-attribute ¶
profiler = create_omni_profiler(
profiler_config=profiler_config,
worker_name=worker_name,
local_rank=local_rank,
)
determine_available_memory ¶
determine_available_memory() -> int
Process-scoped GPU memory profiling for concurrent stage initialization.
Algorithm
-
requested_memory = total_gpu_memory * gpu_memory_utilization (computed in init_device from cache_config)
-
process_memory = memory used by THIS process only (via pynvml)
- Uses nvmlDeviceGetComputeRunningProcesses to get per-PID memory
-
Supports CUDA_VISIBLE_DEVICES with indices, UUIDs, or MIG IDs
-
available_kv_cache = requested_memory - process_memory
Fallback
If NVML is unavailable, falls back to profiling data: available = requested - (weights + activations + non_torch)
handle_sleep_task ¶
handle_sleep_task(task: OmniSleepTask) -> OmniACK
Handle deterministic Sleep command from the main process
handle_wake_task ¶
handle_wake_task(task: OmniWakeTask) -> OmniACK
Handle deterministic Wakeup command from the main process
profile ¶
Override to set trace filename before starting the profiler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
is_start | bool | True to start profiling, False to stop. | True |
profile_prefix | str | None | Optional prefix for trace filename (vLLM compat). | None |
vLLM's profile() only passes is_start, so we generate a descriptive trace filename here before delegating to the profiler.
sleep ¶
Put the worker to sleep. Args: level: 1 (Offload weights to CPU), level: 2 (Total Discard).