Skip to content

vllm_omni.worker.gpu_generation_model_runner ¶

Code2Wav GPU Model Runner for vLLM-Omni.

Handles direct conversion from codec codes to audio waveforms for Qwen3 Omni MoE Code2Wav. This is a non-autoregressive model that doesn't require sampling or logits computation.

logger `module-attribute` ¶

logger = logging.getLogger(__name__)

GPUGenerationModelRunner ¶

Bases: OmniGPUModelRunner, OmniConnectorModelRunnerMixin

Generation model runner for vLLM-Omni (non-autoregressive).

Reuses GPUModelRunner preparation, multimodal handling, and TP/PP/DP glue.
Does not compute logits or perform token sampling.
Executes generation process and returns tensors via pooler_output.

execute_model ¶

execute_model(
    scheduler_output: SchedulerOutput,
    intermediate_tensors: IntermediateTensors | None = None,
) -> OmniModelRunnerOutput | IntermediateTensors

profile_run ¶

profile_run() -> None

sample_tokens ¶

sample_tokens(
    grammar_output: GrammarOutput | None = None,
) -> (
    OmniModelRunnerOutput
    | AsyncModelRunnerOutput
    | IntermediateTensors
)