Skip to content

vllm_omni.model_executor.models.qwen3_tts.pipeline

Qwen3-TTS pipeline: Talker (text → RVQ codec) → Code2Wav (codec → audio).

Chunked vs end-to-end mode is dispatched from deploy.async_chunk.

QWEN3_TTS_PIPELINE module-attribute

QWEN3_TTS_PIPELINE = PipelineConfig(
    model_type="qwen3_tts",
    model_arch="Qwen3TTSTalkerForConditionalGeneration",
    stages=(
        StagePipelineConfig(
            stage_id=0,
            model_stage="qwen3_tts",
            execution_type=LLM_AR,
            input_sources=(),
            owns_tokenizer=True,
            engine_output_type="latent",
            async_chunk_process_next_stage_input_func=f"{_PROC}.talker2code2wav_async_chunk",
            custom_process_next_stage_input_func=f"{_PROC}.talker2code2wav_full_payload",
            sampling_constraints={
                "detokenize": False,
                "stop_token_ids": [2150],
            },
        ),
        StagePipelineConfig(
            stage_id=1,
            model_stage="code2wav",
            execution_type=LLM_GENERATION,
            input_sources=(0,),
            final_output=True,
            final_output_type="audio",
            engine_output_type="audio",
            model_arch="Qwen3TTSCode2Wav",
            sync_process_input_func=f"{_PROC}.talker2code2wav_token_only",
            sampling_constraints={"detokenize": True},
            extras={
                "tts_args": {"max_instructions_length": 500}
            },
        ),
    ),
)