Skip to content

vllm_omni.diffusion.models.omnivoice.pipeline_omnivoice

OmniVoice TTS Pipeline for vLLM-Omni diffusion engine.

Single-stage pipeline that runs the full text-to-speech flow

text → tokenize → 32-step iterative unmasking → 8-codebook tokens → DAC decode → 24kHz audio

Uses request-mode execution (all steps in one forward() call).

logger module-attribute

logger = init_logger(__name__)

OmniVoicePipeline

Bases: Module, SupportAudioOutput

OmniVoice text-to-speech pipeline for the diffusion engine.

Wraps OmniVoiceGenerator (32-step iterative unmasking) and OmniVoiceDecoder (HiggsAudioV2 RVQ + DAC) into a single forward() call.

audio_tokenizer instance-attribute

audio_tokenizer = eval()

class_temperature instance-attribute

class_temperature = class_temperature

config instance-attribute

config = OmniVoiceConfig(**hf_config)

decoder instance-attribute

decoder = OmniVoiceDecoder(config)

device instance-attribute

device = get_local_device()

duration_estimator instance-attribute

duration_estimator = RuleDurationEstimator()

generator instance-attribute

generator = OmniVoiceGenerator(config)

guidance_scale instance-attribute

guidance_scale = guidance_scale

layer_penalty_factor instance-attribute

layer_penalty_factor = layer_penalty_factor

model_path instance-attribute

model_path = model

num_step instance-attribute

num_step = num_step

od_config instance-attribute

od_config = od_config

position_temperature instance-attribute

position_temperature = position_temperature

sample_rate instance-attribute

sample_rate = sample_rate

support_audio_output class-attribute

support_audio_output: bool = True

t_shift instance-attribute

t_shift = t_shift

tokenizer instance-attribute

tokenizer = from_file(tokenizer_path)

forward

Generate speech audio from text, optionally with voice cloning.

Accepts either a plain text prompt or a structured dict

{"text": "...", "ref_audio": (samples, sr), "ref_text": "...", "lang": "...", "instruct": "..."}

load_weights

load_weights(
    weights: Iterable[tuple[str, Tensor]],
) -> set[str]

Load weights from model directory (not from the iterator).

The diffusion model loader passes HF safetensors weights, but OmniVoice has custom weight names (llm. → generator., audio_tokenizer. → decoder.). We load from model_path directly and return all param names to satisfy the loader's "all weights initialized" check.

get_omnivoice_post_process_func

get_omnivoice_post_process_func(
    od_config: OmniDiffusionConfig,
)

Post-processing: convert audio tensor to numpy for WAV encoding.