vllm_omni.model_executor.models.qwen3_tts.qwen3_tts_talker ¶
AttentiveStatisticsPooling ¶
Bases: Module
Attentive statistic pooling layer: returns concatenated mean and std.
conv instance-attribute ¶
conv = Conv1d(
attention_channels,
channels,
kernel_size=1,
padding="same",
padding_mode="reflect",
)
Qwen3TTSSpeakerEncoder ¶
Bases: Module
ECAPA-TDNN speaker encoder.
Reference: "ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification" (https://huggingface.co/papers/2005.07143).
asp instance-attribute ¶
asp = AttentiveStatisticsPooling(
enc_channels[-1],
attention_channels=enc_attention_channels,
)
fc instance-attribute ¶
fc = Conv1d(
enc_channels[-1] * 2,
enc_dim,
kernel_size=1,
padding="same",
padding_mode="reflect",
)
mfa instance-attribute ¶
mfa = TimeDelayNetBlock(
enc_channels[-1],
enc_channels[-1],
enc_kernel_sizes[-1],
enc_dilations[-1],
)
Qwen3TTSTalkerForConditionalGeneration ¶
Bases: Module
vLLM-AR talker: step-wise layer-0 codec decoding. Predicts residual codebooks (1..Q-1) into audio_codes and streams text via tailing_text_hidden.
code_predictor instance-attribute ¶
code_predictor = (
Qwen3TTSTalkerCodePredictorForConditionalGenerationVLLM(
vllm_config=_code_predictor_vllm_config,
config=code_predictor_config,
talker_config=talker_config,
prefix="code_predictor",
)
)
gpu_resident_buffer_keys instance-attribute ¶
gpu_resident_buffer_keys: set[tuple[str, str]] = {
("codes", "audio"),
("hidden_states", "last"),
("hidden_states", "trailing_text"),
}
hf_to_vllm_mapper class-attribute instance-attribute ¶
hf_to_vllm_mapper = WeightsMapper(
orig_to_new_prefix={
"talker.model.layers.": "model.layers.",
"talker.model.norm.": "model.norm.",
"talker.model.codec_embedding.": "model.embed_tokens.",
"talker.codec_head.": "lm_head.",
"talker.model.text_embedding.": "text_embedding.",
"talker.text_projection.": "text_projection.",
"talker.code_predictor.": "code_predictor.",
"speaker_encoder.": "speaker_encoder.",
}
)
lm_head instance-attribute ¶
lm_head = ParallelLMHead(
vocab_size,
hidden_size,
quant_config=quant_config,
prefix=maybe_prefix(prefix, "lm_head"),
)
make_empty_intermediate_tensors instance-attribute ¶
model instance-attribute ¶
requires_full_prefix_cached_hidden_states instance-attribute ¶
speaker_encoder instance-attribute ¶
speaker_encoder = Qwen3TTSSpeakerEncoder(
speaker_encoder_config
)
text_projection instance-attribute ¶
text_projection = Qwen3TTSTalkerResizeMLP(
text_hidden_size,
text_hidden_size,
hidden_size,
hidden_act,
bias=True,
)
compute_logits ¶
compute_logits(
hidden_states: Tensor | OmniOutput,
sampling_metadata: Any = None,
) -> Tensor | None
forward ¶
forward(
input_ids: Tensor,
positions: Tensor,
intermediate_tensors: IntermediateTensors | None = None,
inputs_embeds: Tensor | None = None,
**_: Any,
) -> Tensor | IntermediateTensors
make_omni_output ¶
make_omni_output(
model_outputs: Tensor | OmniOutput, **kwargs: Any
) -> OmniOutput
preprocess ¶
preprocess(
input_ids: Tensor,
input_embeds: Tensor | None,
**info_dict: Any,
) -> tuple[Tensor, Tensor, dict[str, Any]]
preprocess_batch ¶
preprocess_batch(
*,
req_ids: list[str],
model_intermediate_buffer: dict[str, dict[str, Any]],
device: device,
) -> None
Delegate batched preprocess to :class:Qwen3TTSPromptEmbedsBuilder.
preprocess_decode_batch ¶
preprocess_decode_batch(
*, input_ids: Tensor, req_infos: list[dict[str, Any]]
) -> tuple[
Tensor, Tensor, Tensor, Tensor, list[dict[str, Any]]
]
Batch the decode-only preprocess path for Qwen3-TTS.
This mirrors the scalar decode branch in preprocess(), but performs the token embedding lookup once for the whole decode batch.
talker_mtp ¶
talker_mtp(
input_ids: Tensor,
input_embeds: Tensor,
last_talker_hidden: Tensor,
text_step: Tensor,
do_sample: bool | None = None,
temperature: float | None = None,
top_k: int | None = None,
top_p: float | None = None,
generator: Generator | None = None,
**kwargs: Any,
) -> tuple[Tensor, Tensor]
GPU fast-path used by OmniGPUModelRunner to predict residual codebooks (1..Q-1). Returns (inputs_embeds, audio_codes) for the current step.
Qwen3TTSTalkerResizeMLP ¶
Bases: Module
Two-layer MLP that maps between hidden sizes with an activation in between.
Res2NetBlock ¶
Bases: Module
blocks instance-attribute ¶
blocks = ModuleList(
[
(
TimeDelayNetBlock(
in_channel,
hidden_channel,
kernel_size=kernel_size,
dilation=dilation,
)
)
for _ in (range(scale - 1))
]
)
SqueezeExcitationRes2NetBlock ¶
Bases: Module
TDNN-Res2Net-TDNN-SE building block used in ECAPA-TDNN.
res2net_block instance-attribute ¶
res2net_block = Res2NetBlock(
out_channels,
out_channels,
res2net_scale,
kernel_size,
dilation,
)
se_block instance-attribute ¶
se_block = SqueezeExcitationBlock(
out_channels, se_channels, out_channels
)
tdnn1 instance-attribute ¶
tdnn1 = TimeDelayNetBlock(
in_channels, out_channels, kernel_size=1, dilation=1
)
tdnn2 instance-attribute ¶
tdnn2 = TimeDelayNetBlock(
out_channels, out_channels, kernel_size=1, dilation=1
)