vllm_omni.model_executor.models.ming_tts ¶
Modules:
| Name | Description |
|---|---|
aggregator | |
audio_prep | |
config_ming_tts | Ming dense checkpoint config adapters. |
constants | |
flowloss_head | |
fm | |
ming_tts | |
ming_tts_audio_vae | |
ming_tts_llm | |
patch_emission | |
pipeline | Ming TTS pipeline: Stage-0 LLM+flow -> Stage-1 audio VAE. |
prompt_assembly | |
prompt_encoder | |
speaker_extractor | |
validation | |
MingAudioVAEModel ¶
Bases: Module
enable_update_additional_information instance-attribute ¶
chunked_decode_streaming ¶
chunked_decode_streaming(
latent_chunk: Tensor, *, request_id: str, finished: bool
) -> tuple[Tensor, Any, Any, bool, bool]
compute_logits ¶
compute_logits(
hidden_states: Tensor | OmniOutput,
sampling_metadata: Any = None,
) -> None
MingLLMModel ¶
Bases: Module
flowloss instance-attribute ¶
flowloss = FlowLoss(
z_channels=latent_dim,
llm_cond_dim=llm_hidden_size,
**(ditar_config),
)
hf_to_vllm_mapper class-attribute instance-attribute ¶
linear_proj_audio instance-attribute ¶
linear_proj_audio = Aggregator(
in_channels=latent_dim,
llm_input_dim=llm_hidden_size,
**(aggregator_config),
)
model instance-attribute ¶
compute_logits ¶
embed_input_ids ¶
embed_input_ids(
input_ids: Tensor,
inputs_embeds: Tensor | None = None,
**_: Any,
) -> Tensor
forward ¶
forward(
input_ids: Tensor,
positions: Tensor,
intermediate_tensors: IntermediateTensors | None = None,
inputs_embeds: Tensor | None = None,
latent_history: Tensor | None = None,
model_intermediate_buffer: list[dict[str, Any]]
| None = None,
seq_token_counts: list[int] | None = None,
**kwargs: object,
) -> OmniOutput | IntermediateTensors | Tensor
MingTTSForConditionalGeneration ¶
Bases: Module, SupportsPP, CustomProcessMixin