Skip to content

vllm_omni.model_executor.stage_input_processors.moss_tts

Stage input processors: MOSS-TTS talker (Stage 0) → codec (Stage 1).

logger module-attribute

logger = init_logger(__name__)

talker2codec

talker2codec(
    stage_list: list[Any],
    engine_input_source: list[int],
    prompt: Any = None,
    requires_multimodal_data: bool = False,
) -> list[Any]

Convert all talker codes to a single Stage-1 token sequence.

Stage 0 output contains codes["audio"] shaped (T, NQ) where T is the number of generated audio frames and NQ is n_vq. We flatten to [NQ * T] as the Stage-1 input_ids so the codec can reshape back to (NQ, T) for decoding.

talker2codec_async_chunk

talker2codec_async_chunk(
    transfer_manager: Any,
    pooling_output: dict[str, Any] | None,
    request: Any,
    is_finished: bool = False,
) -> OmniPayloadStruct | None

Emit accumulated audio codes to Stage 1 as they arrive from Stage 0.

State is maintained in transfer_manager keyed by request ID. A chunk is forwarded to Stage 1 when either: (a) is_finished is True (flush all remaining codes), or (b) the accumulated frame count reaches chunk_frames (default 25).

Returns a dict compatible with the Stage-1 input format, or None to signal "not enough data yet — wait for more frames".