Skip to content

vllm_omni.platforms

Modules:

Name Description
cuda
interface
musa
npu
rocm
xpu

builtin_omni_platform_plugins module-attribute

builtin_omni_platform_plugins = {
    "cuda": cuda_omni_platform_plugin,
    "rocm": rocm_omni_platform_plugin,
    "npu": npu_omni_platform_plugin,
    "xpu": xpu_omni_platform_plugin,
    "musa": musa_omni_platform_plugin,
}

current_omni_platform module-attribute

current_omni_platform: OmniPlatform

logger module-attribute

logger = getLogger(__name__)

OmniPlatform

Bases: Platform

Abstract base class for vllm-omni Platform.

Inherits from vLLM's Platform and adds Omni-specific interfaces. This gives OmniPlatform all vLLM Platform capabilities plus Omni-specific methods.

create_autocast_context classmethod

create_autocast_context(
    *, device_type: str, dtype: dtype, enabled: bool = True
)

get_default_stage_config_path classmethod

get_default_stage_config_path() -> str

get_device_count classmethod

get_device_count() -> int

get_device_version classmethod

get_device_version() -> str | None

get_diffusion_attn_backend_cls classmethod

get_diffusion_attn_backend_cls(
    selected_backend: str | None, head_size: int
) -> str

Get the diffusion attention backend class path for this platform.

This method selects the appropriate attention backend for diffusion models based on platform capabilities and user preferences.

Parameters:

Name Type Description Default
selected_backend str | None

User-selected backend name (e.g., "FLASH_ATTN", "TORCH_SDPA", "SAGE_ATTN"). If None, uses platform default.

required
head_size int

Attention head size.

required

Returns:

Type Description
str

Fully qualified class path of the selected backend.

get_diffusion_model_impl_qualname classmethod

get_diffusion_model_impl_qualname(op_name: str) -> str

get_diffusion_model_runner_cls classmethod

get_diffusion_model_runner_cls() -> str

Get the diffusion model runner class path for this platform.

Returns a fully qualified class path string. The class must be compatible with the DiffusionModelRunner interface.

get_diffusion_packed_modules_mapping classmethod

get_diffusion_packed_modules_mapping(
    model_class: type[Module],
) -> dict[str, list[str]] | None

get_diffusion_worker_cls classmethod

get_diffusion_worker_cls() -> str

Get the diffusion worker class path for this platform.

Returns a fully qualified class path string that will be resolved and instantiated by WorkerWrapperBase. The class must be compatible with the DiffusionWorker interface.

get_free_memory classmethod

get_free_memory(device: device | None = None) -> int

get_graph_wrapper_cls classmethod

get_graph_wrapper_cls() -> type

Return the platform's full-graph wrapper class.

Defaults to vLLM's CUDAGraphWrapper; NPU overrides with ACLGraphWrapper.

get_omni_ar_worker_cls classmethod

get_omni_ar_worker_cls() -> str

get_omni_generation_worker_cls classmethod

get_omni_generation_worker_cls() -> str

get_profiler_cls classmethod

get_profiler_cls() -> str

Get the profiler class for this platform.

Returns:

Type Description
str

Fully qualified class path of the profiler.

str

Default returns the base OmniTorchProfilerWrapper.

get_torch_device classmethod

get_torch_device(local_rank: int | None = None) -> device

has_flash_attn_package classmethod

has_flash_attn_package() -> bool

Check if a Flash Attention package is available and usable on this platform.

is_cuda

is_cuda() -> bool

is_musa

is_musa() -> bool

is_npu

is_npu() -> bool

is_out_of_tree

is_out_of_tree() -> bool

is_rocm

is_rocm() -> bool

is_xpu

is_xpu() -> bool

prepare_diffusion_op_runtime classmethod

prepare_diffusion_op_runtime(
    op_name: str, **kwargs: Any
) -> None

set_device_control_env_var classmethod

set_device_control_env_var(
    devices: str | int | None,
) -> None

set_forward_context classmethod

set_forward_context(
    attn_metadata: Any,
    vllm_config: VllmConfig,
    *,
    cudagraph_runtime_mode: CUDAGraphMode,
    batch_descriptor: BatchDescriptor,
)

Platform-neutral wrapper around the device's set_forward_context.

Defaults to vLLM's set_forward_context; NPU overrides to dispatch to set_ascend_forward_context (renaming cudagraph_runtime_mode to aclgraph_runtime_mode).

supports_cpu_offload classmethod

supports_cpu_offload() -> bool

supports_float64 classmethod

supports_float64() -> bool

supports_torch_inductor classmethod

supports_torch_inductor() -> bool

Check if the platform supports torch.compile with inductor backend.

synchronize classmethod

synchronize() -> None

unset_device_control_env_var classmethod

unset_device_control_env_var() -> None

OmniPlatformEnum

Bases: Enum

Enum for supported Omni platforms.

CUDA class-attribute instance-attribute

CUDA = 'cuda'

MUSA class-attribute instance-attribute

MUSA = 'musa'

NPU class-attribute instance-attribute

NPU = 'npu'

OOT class-attribute instance-attribute

OOT = 'oot'

ROCM class-attribute instance-attribute

ROCM = 'rocm'

UNSPECIFIED class-attribute instance-attribute

UNSPECIFIED = 'unspecified'

XPU class-attribute instance-attribute

XPU = 'xpu'

cuda_omni_platform_plugin

cuda_omni_platform_plugin() -> str | None

Check if CUDA OmniPlatform should be activated.

musa_omni_platform_plugin

musa_omni_platform_plugin() -> str | None

Check if MUSA OmniPlatform should be activated.

npu_omni_platform_plugin

npu_omni_platform_plugin() -> str | None

Check if NPU OmniPlatform should be activated.

resolve_current_omni_platform_cls_qualname

resolve_current_omni_platform_cls_qualname() -> str

Resolve the current OmniPlatform class qualified name.

rocm_omni_platform_plugin

rocm_omni_platform_plugin() -> str | None

Check if ROCm OmniPlatform should be activated.

xpu_omni_platform_plugin

xpu_omni_platform_plugin() -> str | None

Check if XPU OmniPlatform should be activated.