vllm_omni.diffusion.models.omnivoice.pipeline_omnivoice ¶
OmniVoice TTS Pipeline for vLLM-Omni diffusion engine.
Single-stage pipeline that runs the full text-to-speech flow
text → tokenize → 32-step iterative unmasking → 8-codebook tokens → DAC decode → 24kHz audio
Uses request-mode execution (all steps in one forward() call).
OmniVoicePipeline ¶
Bases: Module, SupportAudioOutput
OmniVoice text-to-speech pipeline for the diffusion engine.
Wraps OmniVoiceGenerator (32-step iterative unmasking) and OmniVoiceDecoder (HiggsAudioV2 RVQ + DAC) into a single forward() call.
forward ¶
forward(req: OmniDiffusionRequest) -> DiffusionOutput
Generate speech audio from text, optionally with voice cloning.
Accepts either a plain text prompt or a structured dict
{"text": "...", "ref_audio": (samples, sr), "ref_text": "...", "lang": "...", "instruct": "..."}
load_weights ¶
Load weights from model directory (not from the iterator).
The diffusion model loader passes HF safetensors weights, but OmniVoice has custom weight names (llm. → generator., audio_tokenizer. → decoder.). We load from model_path directly and return all param names to satisfy the loader's "all weights initialized" check.
get_omnivoice_post_process_func ¶
get_omnivoice_post_process_func(
od_config: OmniDiffusionConfig,
)
Post-processing: convert audio tensor to numpy for WAV encoding.