Skip to content

vllm_omni.model_executor.models.cosyvoice3.pipeline

CosyVoice3 pipeline topology (frozen).

Stage 0: Talker — text prompt → speech tokens (LLM autoregressive). Stage 1: Code2Wav — flow-matching decoder → acoustic features → waveform. * sync_process_input_func runs when deploy.async_chunk=false: stage 1 builds full-sequence flow input via text2flow. * async_chunk_process_next_stage_input_func runs when deploy.async_chunk=true: stage 0 streams codec chunks to stage 1 through the shared-memory connector.

COSYVOICE3_PIPELINE module-attribute

COSYVOICE3_PIPELINE = PipelineConfig(
    model_type="cosyvoice3",
    model_arch="CosyVoice3Model",
    stages=(
        StagePipelineConfig(
            stage_id=0,
            model_stage="cosyvoice3_talker",
            execution_type=LLM_AR,
            input_sources=(),
            owns_tokenizer=True,
            engine_output_type="latent",
            async_chunk_process_next_stage_input_func=f"{_PROC}.talker2code2wav_async_chunk",
            custom_process_next_stage_input_func=f"{_PROC}.text2flow_full_payload",
            sampling_constraints={"stop_token_ids": [6562]},
        ),
        StagePipelineConfig(
            stage_id=1,
            model_stage="cosyvoice3_code2wav",
            execution_type=LLM_GENERATION,
            input_sources=(0,),
            final_output=True,
            final_output_type="audio",
            engine_output_type="latent",
            custom_process_input_func=f"{_PROC}.text2flow",
            sync_process_input_func=f"{_PROC}.text2flow_token_only",
        ),
    ),
)