vllm_omni.model_executor.models.higgs_audio_v3.pipeline ¶
higgs-audio v3 pipeline: Talker (text -> 8-codebook codec) -> Code2Wav (codec -> 24 kHz PCM).
Two delivery modes are wired here:
sync_process_input_funcruns when the deploy YAML hasasync_chunk: false. The orchestrator collects the entire Stage-0 emit, reverts the delay pattern once, and hands a single payload to Stage 1.async_chunk_process_next_stage_input_funcruns when the deploy YAML hasasync_chunk: true. Stage 0 dispatches per AR step; the streaming adapter buffers raw delay-pattern rows, slides a window with left context and right holdback, and emits codec-ready frames per chunk. Stage 1 trims the overlap on both ends so the client sees a coherent PCM stream.
HIGGS_AUDIO_V3_PIPELINE module-attribute ¶
HIGGS_AUDIO_V3_PIPELINE = PipelineConfig(
model_type="higgs_multimodal_qwen3",
model_arch="HiggsAudioV3TalkerForConditionalGeneration",
hf_architectures=(
"HiggsMultimodalQwen3ForConditionalGeneration",
),
stages=(
StagePipelineConfig(
stage_id=0,
model_stage="higgs_audio_v3",
execution_type=LLM_AR,
input_sources=(),
owns_tokenizer=True,
engine_output_type="latent",
sampling_constraints={
"detokenize": False,
"stop_token_ids": [151643, 151671],
},
async_chunk_process_next_stage_input_func=f"{_PROC}.talker2code2wav_async_chunk",
),
StagePipelineConfig(
stage_id=1,
model_stage="code2wav",
execution_type=LLM_GENERATION,
input_sources=(0,),
final_output=True,
final_output_type="audio",
engine_output_type="audio",
model_arch="HiggsAudioV3Code2WavForConditionalGeneration",
sync_process_input_func=f"{_PROC}.talker2code2wav",
sampling_constraints={"detokenize": True},
),
),
)