Skip to content

vllm_omni.worker.gpu_generation_model_runner

Code2Wav GPU Model Runner for vLLM-Omni.

Handles direct conversion from codec codes to audio waveforms for Qwen3 Omni MoE Code2Wav. This is a non-autoregressive model that doesn't require sampling or logits computation.

logger module-attribute

logger = getLogger(__name__)

GPUGenerationModelRunner

Bases: OmniGPUModelRunner, OmniConnectorModelRunnerMixin

Generation model runner for vLLM-Omni (non-autoregressive).

  • Reuses GPUModelRunner preparation, multimodal handling, and TP/PP/DP glue.
  • Does not compute logits or perform token sampling.
  • Executes generation process and returns tensors via pooler_output.

execute_model

execute_model(
    scheduler_output: SchedulerOutput,
    intermediate_tensors: IntermediateTensors | None = None,
) -> OmniModelRunnerOutput | IntermediateTensors

profile_run

profile_run() -> None

sample_tokens

sample_tokens(
    grammar_output: GrammarOutput | None = None,
) -> (
    OmniModelRunnerOutput
    | AsyncModelRunnerOutput
    | IntermediateTensors
)