vllm_omni.worker.gpu_generation_model_runner ¶
Code2Wav GPU Model Runner for vLLM-Omni.
Handles direct conversion from codec codes to audio waveforms for Qwen3 Omni MoE Code2Wav. This is a non-autoregressive model that doesn't require sampling or logits computation.
GPUGenerationModelRunner ¶
Bases: OmniGPUModelRunner, OmniConnectorModelRunnerMixin
Generation model runner for vLLM-Omni (non-autoregressive).
- Reuses GPUModelRunner preparation, multimodal handling, and TP/PP/DP glue.
- Does not compute logits or perform token sampling.
- Executes generation process and returns tensors via
pooler_output.
execute_model ¶
execute_model(
scheduler_output: SchedulerOutput,
intermediate_tensors: IntermediateTensors | None = None,
) -> OmniModelRunnerOutput | IntermediateTensors
sample_tokens ¶
sample_tokens(
grammar_output: GrammarOutput | None = None,
) -> (
OmniModelRunnerOutput
| AsyncModelRunnerOutput
| IntermediateTensors
)