Skip to content

vLLM-Omni · MiniCPM-o 4.5 Online Demo

Source https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/minicpmo.

Gradio-based web UI for MiniCPM-o 4.5 served via vllm-omni's OpenAI-compatible endpoints.

The UI supports:

  • Inputs: text prompt + optional image, audio (file or mic), video.
  • Outputs: text + speech (WAV player).

1. Start the backend server

The deploy config auto-loads via --omni; the default vllm_omni/deploy/minicpmo_4_5.yaml targets a 2-GPU layout (thinker on GPU 0, talker + t2w sharing GPU 1). For other hardware layouts pick one of the deploy variants below.

deploy config GPUs Notes
minicpmo_4_5.yaml (default) 2 Thinker on GPU0, talker+t2w on GPU1.
minicpmo_4_5_3gpu.yaml 3 Thinker 2-way TP on GPU0/1, talker+t2w share GPU2.
minicpmo_4_5_8x4090.yaml 8 Full 8x4090 layout.

Default (2-GPU):

vllm serve openbmb/MiniCPM-o-4_5 --omni \
    --trust-remote-code \
    --host 0.0.0.0 --port 8099

Other layouts via --deploy-config:

vllm serve openbmb/MiniCPM-o-4_5 --omni \
    --deploy-config vllm_omni/deploy/minicpmo_4_5_8x4090.yaml \
    --trust-remote-code \
    --host 0.0.0.0 --port 8099

2. Launch the Gradio demo

bash examples/online_serving/minicpmo/run_gradio_demo.sh

# Or run the python entry point directly:
python examples/online_serving/minicpmo/gradio_demo.py \
    --minicpmo45-api-base http://localhost:8099/v1 \
    --minicpmo45-model openbmb/MiniCPM-o-4_5 \
    --port 7862

Open http://<host>:7862 in a browser.

Notes

  • TTS trigger: the demo sets extra_body.chat_template_kwargs.use_tts_template=True, which appends <|tts_bos|> to the assistant prefix.
  • Uncheck "Generate speech output (TTS)" to get text-only responses (faster).
  • The audio output is the raw WAV returned by the stage-1 talker + Token2Wav; sample rate is 24 kHz.
  • Video input is forwarded as a base64 video_url entry; the server needs decord/torchvision to decode it.

Example materials

gradio_demo.py

Large file omitted from the rendered docs. View it on GitHub: https://github.com/vllm-project/vllm-omni/blob/main/examples/online_serving/minicpmo/gradio_demo.py.

run_gradio_demo.sh
#!/bin/bash
# Launch the MiniCPM-o 4.5 gradio demo.
#
# Prereq:
#   Start a vllm-omni OpenAI server for MiniCPM-o 4.5 on :8099 (see the
#   8x4090 stage config under vllm_omni/model_executor/stage_configs).
set -e

HERE="$(cd "$(dirname "$0")" && pwd)"

: "${MINICPMO45_API_BASE:=http://localhost:8099/v1}"
: "${MINICPMO45_MODEL:=openbmb/MiniCPM-o-4_5}"
: "${GRADIO_HOST:=0.0.0.0}"
: "${GRADIO_PORT:=7862}"
# HTTPS (browsers require a secure context for microphone access).
# Set GRADIO_SSL_CERTFILE / GRADIO_SSL_KEYFILE to enable TLS.
: "${GRADIO_SSL_CERTFILE:=}"
: "${GRADIO_SSL_KEYFILE:=}"

export MINICPMO45_API_BASE MINICPMO45_MODEL

SSL_ARGS=()
if [ -n "$GRADIO_SSL_CERTFILE" ] && [ -n "$GRADIO_SSL_KEYFILE" ] \
   && [ -f "$GRADIO_SSL_CERTFILE" ] && [ -f "$GRADIO_SSL_KEYFILE" ]; then
  SSL_ARGS=(--ssl-certfile "$GRADIO_SSL_CERTFILE" --ssl-keyfile "$GRADIO_SSL_KEYFILE")
  echo "HTTPS enabled: cert=$GRADIO_SSL_CERTFILE key=$GRADIO_SSL_KEYFILE"
else
  echo "HTTPS disabled (cert/key not found). Microphone won't work on remote browsers."
fi

exec python "$HERE/gradio_demo.py" \
    --minicpmo45-api-base "$MINICPMO45_API_BASE" \
    --minicpmo45-model "$MINICPMO45_MODEL" \
    --host "$GRADIO_HOST" \
    --port "$GRADIO_PORT" \
    "${SSL_ARGS[@]}"