vLLM-Omni · MiniCPM-o 4.5 Online Demo¶
Source https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/minicpmo.
Gradio-based web UI for MiniCPM-o 4.5 served via vllm-omni's OpenAI-compatible endpoints.
The UI supports:
- Inputs: text prompt + optional image, audio (file or mic), video.
- Outputs: text + speech (WAV player).
1. Start the backend server¶
The deploy config auto-loads via --omni; the default vllm_omni/deploy/minicpmo_4_5.yaml targets a 2-GPU layout (thinker on GPU 0, talker + t2w sharing GPU 1). For other hardware layouts pick one of the deploy variants below.
| deploy config | GPUs | Notes |
|---|---|---|
minicpmo_4_5.yaml (default) | 2 | Thinker on GPU0, talker+t2w on GPU1. |
minicpmo_4_5_3gpu.yaml | 3 | Thinker 2-way TP on GPU0/1, talker+t2w share GPU2. |
minicpmo_4_5_8x4090.yaml | 8 | Full 8x4090 layout. |
Default (2-GPU):
Other layouts via --deploy-config:
vllm serve openbmb/MiniCPM-o-4_5 --omni \
--deploy-config vllm_omni/deploy/minicpmo_4_5_8x4090.yaml \
--trust-remote-code \
--host 0.0.0.0 --port 8099
2. Launch the Gradio demo¶
bash examples/online_serving/minicpmo/run_gradio_demo.sh
# Or run the python entry point directly:
python examples/online_serving/minicpmo/gradio_demo.py \
--minicpmo45-api-base http://localhost:8099/v1 \
--minicpmo45-model openbmb/MiniCPM-o-4_5 \
--port 7862
Open http://<host>:7862 in a browser.
Notes¶
- TTS trigger: the demo sets
extra_body.chat_template_kwargs.use_tts_template=True, which appends<|tts_bos|>to the assistant prefix. - Uncheck "Generate speech output (TTS)" to get text-only responses (faster).
- The audio output is the raw WAV returned by the stage-1 talker + Token2Wav; sample rate is 24 kHz.
- Video input is forwarded as a base64
video_urlentry; the server needs decord/torchvision to decode it.
Example materials¶
gradio_demo.py
Large file omitted from the rendered docs. View it on GitHub: https://github.com/vllm-project/vllm-omni/blob/main/examples/online_serving/minicpmo/gradio_demo.py.
run_gradio_demo.sh
#!/bin/bash
# Launch the MiniCPM-o 4.5 gradio demo.
#
# Prereq:
# Start a vllm-omni OpenAI server for MiniCPM-o 4.5 on :8099 (see the
# 8x4090 stage config under vllm_omni/model_executor/stage_configs).
set -e
HERE="$(cd "$(dirname "$0")" && pwd)"
: "${MINICPMO45_API_BASE:=http://localhost:8099/v1}"
: "${MINICPMO45_MODEL:=openbmb/MiniCPM-o-4_5}"
: "${GRADIO_HOST:=0.0.0.0}"
: "${GRADIO_PORT:=7862}"
# HTTPS (browsers require a secure context for microphone access).
# Set GRADIO_SSL_CERTFILE / GRADIO_SSL_KEYFILE to enable TLS.
: "${GRADIO_SSL_CERTFILE:=}"
: "${GRADIO_SSL_KEYFILE:=}"
export MINICPMO45_API_BASE MINICPMO45_MODEL
SSL_ARGS=()
if [ -n "$GRADIO_SSL_CERTFILE" ] && [ -n "$GRADIO_SSL_KEYFILE" ] \
&& [ -f "$GRADIO_SSL_CERTFILE" ] && [ -f "$GRADIO_SSL_KEYFILE" ]; then
SSL_ARGS=(--ssl-certfile "$GRADIO_SSL_CERTFILE" --ssl-keyfile "$GRADIO_SSL_KEYFILE")
echo "HTTPS enabled: cert=$GRADIO_SSL_CERTFILE key=$GRADIO_SSL_KEYFILE"
else
echo "HTTPS disabled (cert/key not found). Microphone won't work on remote browsers."
fi
exec python "$HERE/gradio_demo.py" \
--minicpmo45-api-base "$MINICPMO45_API_BASE" \
--minicpmo45-model "$MINICPMO45_MODEL" \
--host "$GRADIO_HOST" \
--port "$GRADIO_PORT" \
"${SSL_ARGS[@]}"