Summary¶
Entry Points¶
Main entry points for vLLM-Omni inference and serving.
- vllm_omni.entrypoints.async_omni.AsyncEventResolver
- vllm_omni.entrypoints.async_omni.AsyncOmni
- vllm_omni.entrypoints.cli.benchmark.base.OmniBenchmarkSubcommandBase
- vllm_omni.entrypoints.cli.benchmark.main.OmniBenchmarkSubcommand
- vllm_omni.entrypoints.cli.benchmark.serve.OmniBenchmarkServingSubcommand
- vllm_omni.entrypoints.cli.serve.OmniServeCommand
- vllm_omni.entrypoints.client_request_state.ClientRequestState
- vllm_omni.entrypoints.omni.Omni
- vllm_omni.entrypoints.omni_base.OmniBase
- vllm_omni.entrypoints.omni_base.OmniEngineDeadError
- vllm_omni.entrypoints.openpi.connection.RobotRealtimeConnection
- vllm_omni.entrypoints.openpi.serving.PolicyServerConfig
- vllm_omni.entrypoints.openpi.serving.ServingRealtimeRobotOpenPI
- vllm_omni.entrypoints.pd_utils.PDDisaggregationMixin
Inputs¶
Input data structures for multi-modal inputs.
- vllm_omni.inputs.data.OmniCustomPrompt
- vllm_omni.inputs.data.OmniDiffusionSamplingParams
- vllm_omni.inputs.data.OmniEmbedsPrompt
- vllm_omni.inputs.data.OmniTextPrompt
- vllm_omni.inputs.data.OmniTokenInputs
- vllm_omni.inputs.data.OmniTokensPrompt
- vllm_omni.inputs.preprocess.OmniInputPreprocessor
Engine¶
Engine classes for offline and online inference.
- vllm_omni.diffusion.diffusion_engine.DiffusionEngine
- vllm_omni.distributed.omni_connectors.connectors.mooncake_transfer_engine_connector.MooncakeAgentMetadata
- vllm_omni.distributed.omni_connectors.connectors.mooncake_transfer_engine_connector.MooncakeTransferEngineConnector
- vllm_omni.distributed.omni_connectors.connectors.mooncake_transfer_engine_connector.QueryRequest
- vllm_omni.distributed.omni_connectors.connectors.mooncake_transfer_engine_connector.QueryResponse
- vllm_omni.engine.AdditionalInformationEntry
- vllm_omni.engine.AdditionalInformationPayload
- vllm_omni.engine.OmniEngineCoreOutput
- vllm_omni.engine.OmniEngineCoreOutputs
- vllm_omni.engine.OmniEngineCoreRequest
- vllm_omni.engine.PromptEmbedsPayload
- vllm_omni.engine.arg_utils.OmniAsyncEngineArgs
- vllm_omni.engine.arg_utils.OmniEngineArgs
- vllm_omni.engine.arg_utils.OrchestratorArgs
- vllm_omni.engine.async_omni_engine.AsyncOmniEngine
- vllm_omni.engine.async_omni_engine.StageRuntimeInfo
- vllm_omni.engine.cfg_companion_tracker.CfgCompanionTracker
- vllm_omni.engine.messages.AbortRequestMessage
- vllm_omni.engine.messages.AddCompanionRequestMessage
- vllm_omni.engine.messages.CollectiveRPCRequestMessage
- vllm_omni.engine.messages.CollectiveRPCResultMessage
- vllm_omni.engine.messages.EngineQueueMessage
- vllm_omni.engine.messages.ErrorMessage
- vllm_omni.engine.messages.OutputMessage
- vllm_omni.engine.messages.RegisterRemoteReplicaMessage
- vllm_omni.engine.messages.ShutdownRequestMessage
- vllm_omni.engine.messages.StageMetricsMessage
- vllm_omni.engine.messages.StageSubmissionMessage
- vllm_omni.engine.messages.UnregisterRemoteReplicaMessage
- vllm_omni.engine.mm_outputs.MultimodalCompletionOutput
- vllm_omni.engine.mm_outputs.MultimodalPayload
- vllm_omni.engine.omni_core_engine_proc_manager.OmniCoreEngineProcManager
- vllm_omni.engine.orchestrator.Orchestrator
- vllm_omni.engine.orchestrator.OrchestratorRequestState
- vllm_omni.engine.orchestrator.StreamingInputState
- vllm_omni.engine.output_modality.OutputModality
- vllm_omni.engine.output_modality.OutputModalityNames
- vllm_omni.engine.output_modality.TensorAccumulationStrategy
- vllm_omni.engine.output_processor.MultimodalOutputProcessor
- vllm_omni.engine.output_processor.OmniRequestState
- vllm_omni.engine.stage_client.StageClient
- vllm_omni.engine.stage_client.StageClientBase
- vllm_omni.engine.stage_client.StagePoolClient
- vllm_omni.engine.stage_client.StagePoolDiffusionClient
- vllm_omni.engine.stage_client.StagePoolLLMClient
- vllm_omni.engine.stage_engine_core_client.DPLBStageEngineCoreClient
- vllm_omni.engine.stage_engine_core_client.StageEngineCoreClient
- vllm_omni.engine.stage_engine_core_client.StageEngineCoreClientBase
- vllm_omni.engine.stage_engine_core_proc.StageEngineCoreProc
- vllm_omni.engine.stage_engine_startup.OmniMasterServer
- vllm_omni.engine.stage_engine_startup.StageAllocation
- vllm_omni.engine.stage_engine_startup.StageCoordinatorAddresses
- vllm_omni.engine.stage_engine_startup.StageRegistrationResponse
- vllm_omni.engine.stage_init_utils.LogicalStageInitPlan
- vllm_omni.engine.stage_init_utils.ReplicaInitPlan
- vllm_omni.engine.stage_init_utils.StageMetadata
- vllm_omni.engine.stage_init_utils.StageRemoteFactoryContext
- vllm_omni.engine.stage_pool.StagePool
- vllm_omni.platforms.npu.omni_connectors.yuanrong_transfer_engine_connector.CleanupRequest
- vllm_omni.platforms.npu.omni_connectors.yuanrong_transfer_engine_connector.QueryRequest
- vllm_omni.platforms.npu.omni_connectors.yuanrong_transfer_engine_connector.QueryResponse
- vllm_omni.platforms.npu.omni_connectors.yuanrong_transfer_engine_connector.YuanrongTransferEngineConnector
Core¶
Core scheduling and caching components.
- vllm_omni.core.prefix_cache.OmniTensorPrefixCache
- vllm_omni.core.sched.omni_ar_scheduler.KVCacheTransferData
- vllm_omni.core.sched.omni_ar_scheduler.OmniARAsyncScheduler
- vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
- vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
- vllm_omni.core.sched.omni_scheduler_mixin.OmniSchedulerMixin
- vllm_omni.core.sched.omni_scheduling_coordinator.OmniSchedulingCoordinator
- vllm_omni.core.sched.output.OmniCachedRequestData
- vllm_omni.core.sched.output.OmniChunkRecvHandle
- vllm_omni.core.sched.output.OmniNewRequestData
- vllm_omni.core.sched.output.OmniSchedulerOutput
- vllm_omni.model_executor.models.cosyvoice3.code2wav_core.cfm.BASECFM
- vllm_omni.model_executor.models.cosyvoice3.code2wav_core.cfm.CausalConditionalCFM
- vllm_omni.model_executor.models.cosyvoice3.code2wav_core.cfm.CausalMaskedDiffWithDiT
- vllm_omni.model_executor.models.cosyvoice3.code2wav_core.cfm.ConditionalCFM
- vllm_omni.model_executor.models.cosyvoice3.code2wav_core.hifigan.CausalConv1d
- vllm_omni.model_executor.models.cosyvoice3.code2wav_core.hifigan.CausalConv1dUpsample
- vllm_omni.model_executor.models.cosyvoice3.code2wav_core.hifigan.CausalConvRNNF0Predictor
- vllm_omni.model_executor.models.cosyvoice3.code2wav_core.hifigan.CausalHiFTGenerator
- vllm_omni.model_executor.models.cosyvoice3.code2wav_core.hifigan.HiFTGenerator
- vllm_omni.model_executor.models.cosyvoice3.code2wav_core.hifigan.SineGen
- vllm_omni.model_executor.models.cosyvoice3.code2wav_core.hifigan.SineGen2
- vllm_omni.model_executor.models.cosyvoice3.code2wav_core.hifigan.SourceModuleHnNSF
- vllm_omni.model_executor.models.qwen3_tts.tokenizer_25hz.vq.core_vq.DistributedGroupResidualVectorQuantization
- vllm_omni.model_executor.models.qwen3_tts.tokenizer_25hz.vq.core_vq.DistributedResidualVectorQuantization
- vllm_omni.model_executor.models.qwen3_tts.tokenizer_25hz.vq.core_vq.EuclideanCodebook
- vllm_omni.model_executor.models.qwen3_tts.tokenizer_25hz.vq.core_vq.VectorQuantization
- vllm_omni.model_executor.models.qwen3_tts.tokenizer_25hz.vq.core_vq.preprocess
Configuration¶
Configuration classes.
- vllm_omni.config.model.OmniModelArchConfigConvertor
- vllm_omni.config.model.OmniModelConfig
- vllm_omni.config.stage_config.DeployConfig
- vllm_omni.config.stage_config.ModelPipeline
- vllm_omni.config.stage_config.PipelineConfig
- vllm_omni.config.stage_config.PlatformOverrides
- vllm_omni.config.stage_config.StageConfig
- vllm_omni.config.stage_config.StageConfigFactory
- vllm_omni.config.stage_config.StageDeployConfig
- vllm_omni.config.stage_config.StageExecutionType
- vllm_omni.config.stage_config.StagePipelineConfig
- vllm_omni.config.stage_config.StageType
- vllm_omni.diffusion.cache.magcache.config.MagCacheConfig
- vllm_omni.diffusion.cache.teacache.config.TeaCacheConfig
- vllm_omni.diffusion.models.dmd2.config.DMD2Config
- vllm_omni.diffusion.models.internvla_a1.config.InternVLAA1Config
- vllm_omni.diffusion.models.internvla_a1.config.InternVLAA1TrainMetadata
- vllm_omni.distributed.omni_connectors.utils.config.ConnectorSpec
- vllm_omni.distributed.omni_connectors.utils.config.OmniTransferConfig
- vllm_omni.model_executor.models.covo_audio.config_covo_audio.CovoAudioCode2WavConfig
- vllm_omni.model_executor.models.fish_speech.configuration_fish_speech.FishSpeechConfig
- vllm_omni.model_executor.models.fish_speech.configuration_fish_speech.FishSpeechFastARConfig
- vllm_omni.model_executor.models.fish_speech.configuration_fish_speech.FishSpeechSlowARConfig
- vllm_omni.model_executor.models.higgs_audio_v2.configuration_higgs_audio_v2.HiggsAudioV2Config
- vllm_omni.model_executor.models.mimo_audio.config_mimo_audio.MiMoAudioConfig
- vllm_omni.model_executor.models.mimo_audio.config_mimo_audio.MiMoAudioTokenizerConfig
- vllm_omni.model_executor.models.moss_tts.configuration_moss_tts.MossTTSDelayConfig
- vllm_omni.model_executor.models.moss_tts.configuration_moss_tts.MossTTSLocalTransformerConfig
- vllm_omni.model_executor.models.moss_tts.configuration_moss_tts.MossTTSRealtimeConfig
- vllm_omni.model_executor.models.moss_tts_nano.configuration_moss_tts_nano.MossTTSNanoConfig
- vllm_omni.model_executor.models.qwen3_tts.configuration_qwen3_tts.Qwen3TTSConfig
- vllm_omni.model_executor.models.qwen3_tts.configuration_qwen3_tts.Qwen3TTSSpeakerEncoderConfig
- vllm_omni.model_executor.models.qwen3_tts.configuration_qwen3_tts.Qwen3TTSTalkerCodePredictorConfig
- vllm_omni.model_executor.models.qwen3_tts.configuration_qwen3_tts.Qwen3TTSTalkerConfig
- vllm_omni.model_executor.models.qwen3_tts.tokenizer_12hz.configuration_qwen3_tts_tokenizer_v2.Qwen3TTSTokenizerV2Config
- vllm_omni.model_executor.models.qwen3_tts.tokenizer_12hz.configuration_qwen3_tts_tokenizer_v2.Qwen3TTSTokenizerV2DecoderConfig
- vllm_omni.model_executor.models.qwen3_tts.tokenizer_25hz.configuration_qwen3_tts_tokenizer_v1.Qwen3TTSTokenizerV1Config
- vllm_omni.model_executor.models.qwen3_tts.tokenizer_25hz.configuration_qwen3_tts_tokenizer_v1.Qwen3TTSTokenizerV1DecoderBigVGANConfig
- vllm_omni.model_executor.models.qwen3_tts.tokenizer_25hz.configuration_qwen3_tts_tokenizer_v1.Qwen3TTSTokenizerV1DecoderConfig
- vllm_omni.model_executor.models.qwen3_tts.tokenizer_25hz.configuration_qwen3_tts_tokenizer_v1.Qwen3TTSTokenizerV1DecoderDiTConfig
- vllm_omni.model_executor.models.qwen3_tts.tokenizer_25hz.configuration_qwen3_tts_tokenizer_v1.Qwen3TTSTokenizerV1EncoderConfig
- vllm_omni.transformers_utils.configs.cosyvoice3.CosyVoice3Config
- vllm_omni.transformers_utils.configs.glm_tts.GLMTTSConfig
- vllm_omni.transformers_utils.configs.mammoth_moda2.Mammothmoda2Config
- vllm_omni.transformers_utils.configs.mammoth_moda2.Mammothmoda2Qwen2_5_VLConfig
- vllm_omni.transformers_utils.configs.mammoth_moda2.Mammothmoda2Qwen2_5_VLTextConfig
- vllm_omni.transformers_utils.configs.mammoth_moda2.Mammothmoda2Qwen2_5_VLVisionConfig
- vllm_omni.transformers_utils.configs.ming_flash_omni.BailingMM2Config
- vllm_omni.transformers_utils.configs.ming_flash_omni.BailingMoeV2Config
- vllm_omni.transformers_utils.configs.ming_flash_omni.MingFlashOmniConfig
- vllm_omni.transformers_utils.configs.ming_flash_omni.MingFlashOmniTalkerConfig
- vllm_omni.transformers_utils.configs.ming_flash_omni.MingImageGenConfig
- vllm_omni.transformers_utils.configs.ming_flash_omni.Qwen3VLMoeVisionConfig
- vllm_omni.transformers_utils.configs.ming_flash_omni.WhisperEncoderConfig
- vllm_omni.transformers_utils.configs.omnivoice.OmniVoiceConfig
- vllm_omni.transformers_utils.configs.voxcpm2.VoxCPM2Config
- vllm_omni.transformers_utils.configs.voxtral_tts.VoxtralTTSConfig
Workers¶
Worker classes and model runners for distributed inference.
- vllm_omni.diffusion.models.dreamzero.video_export_worker.DreamZeroVideoExportWorkerExtension
- vllm_omni.diffusion.worker.diffusion_model_runner.DiffusionModelRunner
- vllm_omni.diffusion.worker.diffusion_worker.CustomPipelineWorkerExtension
- vllm_omni.diffusion.worker.diffusion_worker.DiffusionWorker
- vllm_omni.diffusion.worker.diffusion_worker.WorkerProc
- vllm_omni.diffusion.worker.diffusion_worker.WorkerWrapperBase
- vllm_omni.diffusion.worker.input_batch.InputBatch
- vllm_omni.diffusion.worker.utils.BaseRunnerOutput
- vllm_omni.diffusion.worker.utils.BatchRunnerOutput
- vllm_omni.diffusion.worker.utils.DiffusionRequestState
- vllm_omni.diffusion.worker.utils.RunnerOutput
- vllm_omni.platforms.npu.worker.base.OmniNPUWorkerBase
- vllm_omni.platforms.npu.worker.npu_ar_model_runner.ExecuteModelState
- vllm_omni.platforms.npu.worker.npu_ar_model_runner.NPUARModelRunner
- vllm_omni.platforms.npu.worker.npu_ar_worker.NPUARWorker
- vllm_omni.platforms.npu.worker.npu_generation_model_runner.NPUGenerationModelRunner
- vllm_omni.platforms.npu.worker.npu_generation_worker.NPUGenerationWorker
- vllm_omni.platforms.npu.worker.npu_model_runner.OmniNPUModelRunner
- vllm_omni.platforms.xpu.worker.xpu_ar_model_runner.XPUARModelRunner
- vllm_omni.platforms.xpu.worker.xpu_ar_worker.XPUARWorker
- vllm_omni.platforms.xpu.worker.xpu_generation_model_runner.XPUGenerationModelRunner
- vllm_omni.platforms.xpu.worker.xpu_generation_worker.XPUGenerationWorker
- vllm_omni.worker.base.OmniGPUWorkerBase
- vllm_omni.worker.gpu_ar_model_runner.ExecuteModelState
- vllm_omni.worker.gpu_ar_model_runner.GPUARModelRunner
- vllm_omni.worker.gpu_ar_worker.GPUARWorker
- vllm_omni.worker.gpu_generation_model_runner.GPUGenerationModelRunner
- vllm_omni.worker.gpu_generation_worker.GPUGenerationWorker
- vllm_omni.worker.gpu_memory_utils.parse_cuda_visible_devices
- vllm_omni.worker.gpu_model_runner.OmniGPUModelRunner
- vllm_omni.worker.mixins.OmniWorkerMixin
- vllm_omni.worker.omni_connector_model_runner_mixin.OmniConnectorModelRunnerMixin