vllm_omni.transformers_utils.processors.ming ¶
SYSTEM_PROMPT_NOTHINK module-attribute ¶
SYSTEM_PROMPT_THINK module-attribute ¶
MingFlashOmniProcessor ¶
Bases: ProcessorMixin
Top-level multimodal processor for Ming-flash-omni 2.0.
Adapted from Ming's BailingMM2Processor https://github.com/inclusionAI/Ming/blob/3954fcb880ff5e61ff128bcf7f1ec344d46a6fe3/processing_bailingmm2.py
Subprocessors include: - Qwen2VLImageProcessor (image/video) - MingWhisperFeatureExtractor (modified audio processor using Whisper's log-mel spectrogram)
attributes class-attribute instance-attribute ¶
audio_processor_class class-attribute instance-attribute ¶
image_processor_class class-attribute instance-attribute ¶
video_processor_class class-attribute instance-attribute ¶
apply_chat_template ¶
apply_chat_template(
conversation: list[dict[str, Any]],
sys_prompt_exp: str | None = None,
use_cot_system_prompt: bool = False,
**kwargs,
) -> str
MingWhisperFeatureExtractor ¶
Bases: FeatureExtractionMixin
Whisper log-mel feature extractor for Ming-flash-omni-2.0.
Produces audio_feats in the time-first packed format.
Adapted from Ming's WhisperAudioEncoder https://github.com/inclusionAI/Ming/blob/070dc3c13f95d97952ab7d22030df0c9e28a5122/modeling_whisper_encoder.py and HF transformers WhisperFeatureExtractor https://github.com/huggingface/transformers/blob/f842abaca95a7dbf3fc6e16122e7409109bc1431/src/transformers/models/whisper/feature_extraction_whisper.py#L33