vllm_omni.model_executor.models.omnivoice.omnivoice ¶
OmniVoice model for vLLM-Omni two-stage TTS pipeline.
Stage 0 (Generator): Qwen3 backbone + iterative unmasking → 8-codebook tokens Stage 1 (Decoder): HiggsAudioV2 decoder → 24kHz waveform
OmniVoiceDummyInputsBuilder ¶
Bases: BaseDummyInputsBuilder[OmniVoiceMultiModalProcessingInfo]
get_dummy_mm_data ¶
get_dummy_mm_data(
seq_len: int,
mm_counts: Mapping[str, int],
mm_options: Mapping[str, BaseDummyOptions]
| None = None,
) -> MultiModalDataDict
OmniVoiceModel ¶
Bases: Module
OmniVoice model for vLLM-Omni two-stage pipeline.
Routes to generator (Stage 0) or decoder (Stage 1) based on model_stage.
OmniVoiceMultiModalProcessingInfo ¶
OmniVoiceMultiModalProcessor ¶
Bases: BaseMultiModalProcessor[OmniVoiceMultiModalProcessingInfo]
Processes text + optional reference audio for OmniVoice.
For voice cloning: text + reference audio → tokenized reference For auto voice: text only