Skip to content

vllm_omni.transformers_utils.processors.ming

ASSISTANT_PREFIX module-attribute

ASSISTANT_PREFIX = '<role>ASSISTANT</role>'

DEFAULT_AUDIO_PATCH_TOKEN module-attribute

DEFAULT_AUDIO_PATCH_TOKEN = '<audioPatch>'

DEFAULT_AU_END_TOKEN module-attribute

DEFAULT_AU_END_TOKEN = '</audio>'

DEFAULT_AU_START_TOKEN module-attribute

DEFAULT_AU_START_TOKEN = '<audio>'

DEFAULT_FRAME_PATCH_TOKEN module-attribute

DEFAULT_FRAME_PATCH_TOKEN = '<framePatch>'

DEFAULT_IMAGE_PATCH_TOKEN module-attribute

DEFAULT_IMAGE_PATCH_TOKEN = '<imagePatch>'

DEFAULT_IM_END_TOKEN module-attribute

DEFAULT_IM_END_TOKEN = '</image>'

DEFAULT_IM_START_TOKEN module-attribute

DEFAULT_IM_START_TOKEN = '<image>'

DEFAULT_VID_END_TOKEN module-attribute

DEFAULT_VID_END_TOKEN = '</video>'

DEFAULT_VID_START_TOKEN module-attribute

DEFAULT_VID_START_TOKEN = '<video>'

PLACEHOLDER_AUDIO_TOKEN_IN_TEXT module-attribute

PLACEHOLDER_AUDIO_TOKEN_IN_TEXT = '<AUDIO>'

PLACEHOLDER_IMAGE_TOKEN_IN_TEXT module-attribute

PLACEHOLDER_IMAGE_TOKEN_IN_TEXT = '<IMAGE>'

PLACEHOLDER_VIDEO_TOKEN_IN_TEXT module-attribute

PLACEHOLDER_VIDEO_TOKEN_IN_TEXT = '<VIDEO>'

SYSTEM_PROMPT_NOTHINK module-attribute

SYSTEM_PROMPT_NOTHINK = "<role>SYSTEM</role>你是一个友好的AI助手。\n\ndetailed thinking off"

SYSTEM_PROMPT_THINK module-attribute

SYSTEM_PROMPT_THINK = "<role>SYSTEM</role>你是一个友好的AI助手。\n\ndetailed thinking on"

USER_PREFIX module-attribute

USER_PREFIX = '<role>HUMAN</role>'

logger module-attribute

logger = get_logger(__name__)

MingFlashOmniProcessor

Bases: ProcessorMixin

Top-level multimodal processor for Ming-flash-omni 2.0.

Adapted from Ming's BailingMM2Processor https://github.com/inclusionAI/Ming/blob/3954fcb880ff5e61ff128bcf7f1ec344d46a6fe3/processing_bailingmm2.py

Subprocessors include: - Qwen2VLImageProcessor (image/video) - MingWhisperFeatureExtractor (modified audio processor using Whisper's log-mel spectrogram)

attributes class-attribute instance-attribute

attributes = [
    "image_processor",
    "audio_processor",
    "tokenizer",
]

audio_processor_class class-attribute instance-attribute

audio_processor_class = 'AutoFeatureExtractor'

audio_token instance-attribute

chat_template instance-attribute

chat_template = getattr(tokenizer, 'chat_template', None)

image_processor_class class-attribute instance-attribute

image_processor_class = 'AutoImageProcessor'

image_token instance-attribute

model_input_names property

model_input_names

spatial_merge_size instance-attribute

spatial_merge_size = merge_size

tokenizer_class class-attribute instance-attribute

tokenizer_class = 'AutoTokenizer'

video_processor instance-attribute

video_processor = video_processor

video_processor_class class-attribute instance-attribute

video_processor_class = 'AutoVideoProcessor'

video_token instance-attribute

apply_chat_template

apply_chat_template(
    conversation: list[dict[str, Any]],
    sys_prompt_exp: str | None = None,
    use_cot_system_prompt: bool = False,
    **kwargs,
) -> str

apply_system_template

apply_system_template(
    sys_prompt_exp: str | None = None,
    use_cot_system_prompt: bool = False,
) -> str

batch_decode

batch_decode(*args, **kwargs)

decode

decode(*args, **kwargs)

from_pretrained classmethod

from_pretrained(
    pretrained_model_name_or_path, *args, **kwargs
)

save_pretrained

save_pretrained(
    save_directory, push_to_hub: bool = False, **kwargs
)

MingWhisperFeatureExtractor

Bases: FeatureExtractionMixin

Whisper log-mel feature extractor for Ming-flash-omni-2.0.

Produces audio_feats in the time-first packed format.

Adapted from Ming's WhisperAudioEncoder https://github.com/inclusionAI/Ming/blob/070dc3c13f95d97952ab7d22030df0c9e28a5122/modeling_whisper_encoder.py and HF transformers WhisperFeatureExtractor https://github.com/huggingface/transformers/blob/f842abaca95a7dbf3fc6e16122e7409109bc1431/src/transformers/models/whisper/feature_extraction_whisper.py#L33

feature_size instance-attribute

feature_size = feature_size

model_input_names class-attribute instance-attribute

model_input_names = ['audio_feats', 'audio_feats_lengths']

n_mels property

n_mels: int

sampling_rate instance-attribute

sampling_rate = sampling_rate

raise_missing_video_processor

raise_missing_video_processor()