vllm_omni.diffusion.models.omnivoice.pipeline_omnivoice ¶

OmniVoice TTS Pipeline for vLLM-Omni diffusion engine.

Single-stage pipeline that runs the full text-to-speech flow

text → tokenize → 32-step iterative unmasking → 8-codebook tokens → DAC decode → 24kHz audio

Uses request-mode execution (all steps in one forward() call).

logger `module-attribute` ¶

logger = init_logger(__name__)

OmniVoicePipeline ¶

Bases: Module, SupportAudioOutput

OmniVoice text-to-speech pipeline for the diffusion engine.

Wraps OmniVoiceGenerator (32-step iterative unmasking) and OmniVoiceDecoder (HiggsAudioV2 RVQ + DAC) into a single forward() call.

audio_tokenizer `instance-attribute` ¶

audio_tokenizer = (
    HiggsAudioV2TokenizerModel.from_pretrained(
        audio_tokenizer_path, device_map=self.device
    ).eval()
)

class_temperature `instance-attribute` ¶

class_temperature = self.config.class_temperature

config `instance-attribute` ¶

config = OmniVoiceConfig(**hf_config)

decoder `instance-attribute` ¶

decoder = OmniVoiceDecoder(self.config)

device `instance-attribute` ¶

device = get_local_device()

duration_estimator `instance-attribute` ¶

duration_estimator = RuleDurationEstimator()

generator `instance-attribute` ¶

generator = OmniVoiceGenerator(self.config)

guidance_scale `instance-attribute` ¶

guidance_scale = self.config.guidance_scale

layer_penalty_factor `instance-attribute` ¶

layer_penalty_factor = self.config.layer_penalty_factor

model_path `instance-attribute` ¶

model_path = od_config.model

num_step `instance-attribute` ¶

num_step = self.config.num_step

od_config `instance-attribute` ¶

od_config = od_config

position_temperature `instance-attribute` ¶

position_temperature = self.config.position_temperature

sample_rate `instance-attribute` ¶

sample_rate = self.config.sample_rate

support_audio_output `class-attribute` ¶

support_audio_output: bool = True

t_shift `instance-attribute` ¶

t_shift = self.config.t_shift

tokenizer `instance-attribute` ¶

tokenizer = HFTokenizer.from_file(tokenizer_path)

forward ¶

forward(req: DiffusionRequestBatch) -> DiffusionOutput

Generate speech audio from text, optionally with voice cloning.

Accepts either a plain text prompt or a structured dict

{"text": "...", "ref_audio": (samples, sr), "ref_text": "...", "lang": "...", "instruct": "..."}

load_weights ¶

load_weights(
    weights: Iterable[tuple[str, Tensor]],
) -> set[str]

Load weights from model directory (not from the iterator).

The diffusion model loader passes HF safetensors weights, but OmniVoice has custom weight names (llm. → generator., audio_tokenizer. → decoder.). We load from model_path directly and return all param names to satisfy the loader's "all weights initialized" check.

get_omnivoice_post_process_func ¶

get_omnivoice_post_process_func(
    od_config: OmniDiffusionConfig,
)

Post-processing: convert audio tensor to numpy for WAV encoding.

vllm_omni.diffusion.models.omnivoice.pipeline_omnivoice ¶

logger module-attribute ¶

OmniVoicePipeline ¶

audio_tokenizer instance-attribute ¶

class_temperature instance-attribute ¶

config instance-attribute ¶

decoder instance-attribute ¶

device instance-attribute ¶

duration_estimator instance-attribute ¶

generator instance-attribute ¶

guidance_scale instance-attribute ¶

layer_penalty_factor instance-attribute ¶

model_path instance-attribute ¶

num_step instance-attribute ¶

od_config instance-attribute ¶

position_temperature instance-attribute ¶

sample_rate instance-attribute ¶

support_audio_output class-attribute ¶

t_shift instance-attribute ¶

tokenizer instance-attribute ¶