vllm_omni.model_executor.models.cosyvoice3.cosyvoice3 ¶
CosyVoice3DummyInputsBuilder ¶
Bases: BaseDummyInputsBuilder[CosyVoice3MultiModalProcessingInfo]
get_dummy_mm_data ¶
get_dummy_mm_data(
seq_len: int,
mm_counts: Mapping[str, int],
mm_options: Mapping[str, BaseDummyOptions]
| None = None,
) -> MultiModalDataDict
CosyVoice3Model ¶
Bases: Module, SupportsMultiModal
enable_update_additional_information instance-attribute ¶
supports_multimodal_raw_input_only class-attribute instance-attribute ¶
talker instance-attribute ¶
talker = CosyVoice3LM(
llm_input_size=llm["llm_input_size"],
llm_output_size=llm["llm_output_size"],
speech_token_size=llm["speech_token_size"],
llm=llm,
length_normalized_loss=llm["length_normalized_loss"],
lsm_weight=llm["lsm_weight"],
mix_ratio=llm["mix_ratio"],
)
embed_input_ids ¶
forward ¶
forward(
input_ids: Tensor,
positions: Tensor,
intermediate_tensors: IntermediateTensors | None = None,
inputs_embeds: Tensor | None = None,
additional_information: dict[str, object] | None = None,
**kwargs: object,
) -> OmniOutput
get_language_model ¶
Return the language model for upstream MoE detection.
CosyVoice3MultiModalProcessingInfo ¶
Bases: BaseProcessingInfo
get_hf_config ¶
If the config is not already present pass it as a class and it will try to find it in your model directory just copy the config class there also.
CosyVoice3MultiModalProcessor ¶
Bases: BaseMultiModalProcessor[CosyVoice3MultiModalProcessingInfo]