Online serving Example of vLLM-Omni for MiMo-Audio¶
Source https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/mimo_audio.
🛠️ Installation¶
Please refer to README.md
⚠️ Important (audio generation)
For audio generation (TTS, responses that include synthesized audio, etc.), installflash-attnfor your CUDA and PyTorch stack. Without it on GPU, output audio may be noise-only or unusable. See the FlashAttention repository for compatible builds.
Run examples (MiMo-Audio)¶
Launch the Server¶
export MIMO_AUDIO_TOKENIZER_PATH="XiaomiMiMo/MiMo-Audio-Tokenizer"
vllm serve XiaomiMiMo/MiMo-Audio-7B-Instruct --omni \
--served-model-name "MiMo-Audio-7B-Instruct" \
--port 18091 \
--chat-template ./examples/online_serving/mimo_audio/chat_template.jinja
⚠️ Important
MiMo-Audio is not compatible with the default chat template.
The providedchat_template.jinjaimplements MiMo-specific role, audio token, and instruction formatting and must be used for all inference.
Send Multi-modal Request¶
Get into the example folder
Send request via python¶
# Audio dialogue task
python openai_chat_completion_client_for_multimodal_generation.py \
--query-type multi_audios \
--message-json ../../offline_inference/mimo_audio/message_base64_wav.json
The Python client supports the following command-line arguments:
--query-type(or-q): Query type (default:multi_audios)- Options:
multi_audios,text --message-json(or-m): Path tobase64multi rounds audio messages json file- Do not pass any value for "text" query type
- Supports local file paths (automatically encoded to base64) or HTTP/HTTPS URLs, only for "Are these two audio clips the same?" task
- Example:
---message-json ./examples/offline_inference/mimo_audio/message_base64_wav.json --prompt(or-p): Custom text prompt/question, only for query type is "text"(TTS task)- Attention! Do not pass any value for "multi_audios" query type
- Example:
--prompt "What are the main activities shown in this video?"
For example, to use multi rounds audios with local files:
python openai_chat_completion_client_for_multimodal_generation.py \
--query-type multi_audios \
--message-json ../../offline_inference/mimo_audio/message_base64_wav.json
Example materials¶
chat_template.jinja
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0]['role'] == 'system' %}
{{- messages[0]['content'] }}
{%- else %}
{{- 'You are a helpful assistant.' }}
{%- endif %}
{{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
{%- if messages[0]['role'] == 'system' %}
{%- set _m = '<|sosp|><|empty|><|eosp|>' -%}
{%- set _raw0 = messages[0]['content'] if messages[0]['content'] is string else '' -%}
{%- if _m in _raw0 %}
{%- set _t0 = (_raw0 | replace(_m ~ '\n', '') | replace(_m, '') | trim) -%}
{{- '<|im_start|>system\n' + (_t0 ~ _m if _t0 else _m) + '<|im_end|>\n' }}
{%- else %}
{{- '<|im_start|>system\n' + _raw0 + '<|im_end|>\n' }}
{%- endif %}
{%- else %}
{{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- for message in messages %}
{%- if message['role'] == 'assistant' %}
{{- '<|im_start|>assistant' }}
{%- set _sosp = '<|sosp|><|empty|><|eosp|>' -%}
{%- set _text = message['content'] if message['content'] is string else '' -%}
{%- if _sosp in _text %}
{%- set _clean = _text | replace(_sosp, '') -%}
{%- set _body = _clean[1:] if (_clean and _clean[0] == '\n') else _clean -%}
{{- '\n<|sostm|>' + _body + '<|eot|><|empty|><|eostm|>' }}
{%- else %}
{%- set _body = _text[1:] if (_text and _text[0] == '\n') else _text -%}
{{- '\n<|sostm|>' + _body + '<|eot|><|eostm|>' }}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message['role'] == 'user' %}
{%- set _m = '<|sosp|><|empty|><|eosp|>' -%}
{%- set _raw = message['content'] if message['content'] is string else '' -%}
{%- if _m in _raw %}
{%- set _t = (_raw | replace(_m ~ '\n', '') | replace(_m, '') | trim) -%}
{{- '<|im_start|>user\n' + (_t ~ _m if _t else _m) + '<|im_end|>\n' }}
{%- else %}
{{- '<|im_start|>user\n' + _raw + '<|im_end|>\n' }}
{%- endif %}
{%- elif message['role'] == 'system' %}
{%- if not loop.first %}
{%- set _m = '<|sosp|><|empty|><|eosp|>' -%}
{%- set _raw = message['content'] if message['content'] is string else '' -%}
{%- if _m in _raw %}
{%- set _t = (_raw | replace(_m ~ '\n', '') | replace(_m, '') | trim) -%}
{{- '<|im_start|>system\n' + (_t ~ _m if _t else _m) + '<|im_end|>\n' }}
{%- else %}
{{- '<|im_start|>system\n' + _raw + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- elif message['role'] == 'tool' %}
{%- if (loop.index0 == 0) or (messages[loop.index0 - 1]['role'] != 'tool') %}
{{- '<|im_start|>user' }}
{%- endif %}
{%- set _m = '<|sosp|><|empty|><|eosp|>' -%}
{%- set _raw = message['content'] if message['content'] is string else '' -%}
{%- if _m in _raw %}
{%- set _t = (_raw | replace(_m ~ '\n', '') | replace(_m, '') | trim) -%}
{{- '\n<tool_response>\n' + (_t ~ _m if _t else _m) + '\n</tool_response>' }}
{%- else %}
{{- '\n<tool_response>\n' + _raw + '\n</tool_response>' }}
{%- endif %}
{%- if loop.last or (messages[loop.index0 + 1]['role'] != 'tool') %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n<|sostm|>' }}
{%- endif %}
openai_chat_completion_client_for_multimodal_generation.py
Large file omitted from the rendered docs. View it on GitHub: https://github.com/vllm-project/vllm-omni/blob/main/examples/online_serving/mimo_audio/openai_chat_completion_client_for_multimodal_generation.py.