Image-To-Video¶

Source https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/image_to_video.

This example demonstrates how to deploy the Wan2.2 image-to-video model for online video generation using vLLM-Omni.

Start Server¶

Basic Start¶

vllm serve Wan-AI/Wan2.2-I2V-A14B-Diffusers --omni --port 8091

Start with Parameters¶

Or use the startup script:

bash run_server.sh

The script allows overriding: - MODEL (default: Wan-AI/Wan2.2-I2V-A14B-Diffusers) - PORT (default: 8091) - BOUNDARY_RATIO (default: 0.875) - FLOW_SHIFT (default: 12.0) - CACHE_BACKEND (default: none) - ENABLE_CACHE_DIT_SUMMARY (default: 0)

Ascend / Local LightX2V Example¶

For a local Wan2.2-LightX2V Diffusers directory on Ascend/NPU, you can start the server like this:

vllm serve /path/to/Wan2.2-I2V-A14B-LightX2V-Diffusers-Lightning \
  --omni \
  --port 8091 \
  --flow-shift 12 \
  --cfg-parallel-size 1 \
  --ulysses-degree 4 \
  --use-hsdp \
  --trust-remote-code \
  --allowed-local-media-path / \
  --seed 42

Async Job Behavior¶

POST /v1/videos is asynchronous. It creates a video job and immediately returns metadata like the job ID and initial queued status. To get the final artifact, poll the job status and then download the completed file from the content endpoint.

The main endpoints are: - POST /v1/videos: create a video generation job (async) - POST /v1/videos/sync: generate a video and return raw bytes (sync, for benchmarks) - GET /v1/videos/{video_id}: retrieve the current job status and metadata - GET /v1/videos: list stored video jobs - GET /v1/videos/{video_id}/content: download the generated video file - DELETE /v1/videos/{video_id}: delete the job and any stored output

Sync API (Benchmark / Testing)¶

POST /v1/videos/sync is a synchronous alternative that blocks until generation completes and returns the raw video bytes (video/mp4) directly in the response body. It is designed for benchmark and testing scenarios where one-shot request/response latency measurement is needed.

The sync endpoint accepts the same form parameters as POST /v1/videos. It does not create any stored job record — the response is purely the generated video file. Metadata is returned via response headers:

X-Request-Id: unique identifier for this generation request
X-Model: model name used for generation
X-Inference-Time-S: wall-clock inference time in seconds

curl -X POST http://localhost:8091/v1/videos/sync \
  -F "prompt=A bear playing with yarn, smooth motion" \
  -F "input_reference=@/path/to/input.png" \
  -F "size=832x480" \
  -F "num_frames=33" \
  -F "fps=16" \
  -F "negative_prompt=low quality, blurry, static" \
  -F "num_inference_steps=40" \
  -F "guidance_scale=1.0" \
  -F "guidance_scale_2=1.0" \
  -F "boundary_ratio=0.875" \
  -F "flow_shift=12.0" \
  -F 'extra_params={"sample_solver":"euler"}' \
  -F "seed=42" \
  -o sync_i2v_output.mp4

For Wan Lightning/Distill checkpoints, pass {"sample_solver":"euler"} via extra_params. The default solver is unipc.

Example matching the local LightX2V deployment above:

curl -sS -X POST http://localhost:8091/v1/videos/sync \
  -H "Accept: video/mp4" \
  -F "prompt=A cat playing with yarn" \
  -F "input_reference=@/path/to/input.jpg" \
  -F "width=832" \
  -F "height=480" \
  -F "num_frames=81" \
  -F "fps=16" \
  -F "num_inference_steps=4" \
  -F "guidance_scale=1.0" \
  -F "guidance_scale_2=1.0" \
  -F "boundary_ratio=0.875" \
  -F "seed=42" \
  -F 'extra_params={"sample_solver":"euler"}' \
  -o ./output.mp4

Use /v1/videos/sync if you want to write the MP4 directly to a file. POST /v1/videos is async and returns job metadata, not inline b64_json.

Storage¶

Generated video files are stored on local disk by the async video API. Local file storage behavior can be controlled via the following environment variables:

VLLM_OMNI_SERVER_STORAGE__PATH: directory used for generated files (default: /tmp/storage)
VLLM_OMNI_SERVER_STORAGE__FILE_CONCURRENCY: max concurrent save/delete operations (default: 4)

VLLM_OMNI_STORAGE_PATH and VLLM_OMNI_STORAGE_MAX_CONCURRENCY are deprecated and will be removed in a future release; use the names above instead.

Example:

export VLLM_OMNI_SERVER_STORAGE__PATH=/var/tmp/vllm-omni-videos
export VLLM_OMNI_SERVER_STORAGE__FILE_CONCURRENCY=8

API Calls¶

Method 1: Using curl¶

# Basic image-to-video generation
bash run_curl_image_to_video.sh

# Wan Lightning/Distill checkpoints
SAMPLE_SOLVER=euler bash run_curl_image_to_video.sh

# Or execute directly (OpenAI-style multipart)
create_response=$(curl -s http://localhost:8091/v1/videos \
  -H "Accept: application/json" \
  -F "prompt=A bear playing with yarn, smooth motion" \
  -F "negative_prompt=low quality, blurry, static" \
  -F "input_reference=@/path/to/qwen-bear.png" \
  -F "width=832" \
  -F "height=480" \
  -F "num_frames=33" \
  -F "fps=16" \
  -F "num_inference_steps=40" \
  -F "guidance_scale=1.0" \
  -F "guidance_scale_2=1.0" \
  -F "boundary_ratio=0.875" \
  -F "flow_shift=12.0" \
  -F 'extra_params={"sample_solver":"euler"}' \
  -F "seed=42")

video_id=$(echo "$create_response" | jq -r '.id')
while true; do
  status=$(curl -s "http://localhost:8091/v1/videos/${video_id}" | jq -r '.status')
  if [ "$status" = "completed" ]; then
    break
  fi
  if [ "$status" = "failed" ]; then
    echo "Video generation failed"
    exit 1
  fi
  sleep 2
done

curl -s "http://localhost:8091/v1/videos/${video_id}" | jq .
curl -L "http://localhost:8091/v1/videos/${video_id}/content" -o wan22_i2v_output.mp4

Request Format¶

Required Fields¶

curl -X POST http://localhost:8091/v1/videos \
  -F "prompt=A bear playing with yarn, smooth motion" \
  -F "negative_prompt=low quality, blurry, static" \
  -F "input_reference=@/path/to/qwen-bear.png"

Alternative JSON-Safe Reference Input¶

Use image_reference when you want to pass a URL or JSON-safe image reference instead of uploading a file. Do not send input_reference and image_reference together.

curl -X POST http://localhost:8091/v1/videos \
  -F "prompt=A bear playing with yarn, smooth motion" \
  -F 'image_reference={"image_url":"https://example.com/qwen-bear.png"}'

Generation with Parameters¶

curl -X POST http://localhost:8091/v1/videos \
  -F "prompt=A bear playing with yarn, smooth motion" \
  -F "negative_prompt=low quality, blurry, static" \
  -F "input_reference=@/path/to/qwen-bear.png" \
  -F "width=832" \
  -F "height=480" \
  -F "num_frames=33" \
  -F "fps=16" \
  -F "num_inference_steps=40" \
  -F "guidance_scale=1.0" \
  -F "guidance_scale_2=1.0" \
  -F "boundary_ratio=0.875" \
  -F "flow_shift=12.0" \
  -F 'extra_params={"sample_solver":"euler"}' \
  -F "seed=42"

sample_solver is supported by Wan2.2 online serving through the existing extra_params field, which is merged into the pipeline extra_args. Use unipc for the default multistep solver, or euler for Lightning/Distill checkpoints.

Create Response Format¶

POST /v1/videos returns a job record, not inline base64 video data.

{
  "id": "video_gen_123",
  "object": "video",
  "status": "queued",
  "model": "Wan-AI/Wan2.2-I2V-A14B-Diffusers",
  "prompt": "A bear playing with yarn, smooth motion",
  "created_at": 1234567890
}

Retrieve, List, Download, and Delete¶

Retrieve a job¶

curl -s http://localhost:8091/v1/videos/${video_id} | jq .

List jobs¶

curl -s http://localhost:8091/v1/videos | jq .

Download the completed video¶

curl -L http://localhost:8091/v1/videos/${video_id}/content -o wan22_i2v_output.mp4

Delete a job and its stored file¶

curl -X DELETE http://localhost:8091/v1/videos/${video_id} | jq .

Poll Until Complete¶

while true; do
  status=$(curl -s http://localhost:8091/v1/videos/${video_id} | jq -r '.status')
  if [ "$status" = "completed" ]; then
    break
  fi
  if [ "$status" = "failed" ]; then
    echo "Video generation failed"
    exit 1
  fi
  sleep 2
done

Example materials¶

run_curl_hunyuan_video_15.sh

#!/bin/bash
# HunyuanVideo-1.5 image-to-video curl example using the async video job API.

set -euo pipefail

INPUT_IMAGE="${INPUT_IMAGE:-test_input.jpg}"
BASE_URL="${BASE_URL:-http://localhost:8099}"
OUTPUT_PATH="${OUTPUT_PATH:-hunyuan_video_15_i2v.mp4}"
POLL_INTERVAL="${POLL_INTERVAL:-2}"

if [ ! -f "$INPUT_IMAGE" ]; then
    echo "Input image not found: $INPUT_IMAGE"
    echo "Provide an image via INPUT_IMAGE env var."
    exit 1
fi

create_response=$(
  curl -sS -X POST "${BASE_URL}/v1/videos" \
    -H "Accept: application/json" \
    -F "prompt=The camera follows the puppy as it runs forward on the grass, its four legs alternating steps, its tail held high and wagging side to side." \
    -F "input_reference=@${INPUT_IMAGE}" \
    -F "size=832x480" \
    -F "num_frames=33" \
    -F "fps=24" \
    -F "num_inference_steps=30" \
    -F "guidance_scale=6.0" \
    -F "flow_shift=5.0" \
    -F "seed=42"
)

video_id="$(echo "${create_response}" | jq -r '.id')"
if [ -z "${video_id}" ] || [ "${video_id}" = "null" ]; then
  echo "Failed to create video job:"
  echo "${create_response}" | jq .
  exit 1
fi

echo "Created video job ${video_id}"
echo "${create_response}" | jq .

while true; do
  status_response="$(curl -sS "${BASE_URL}/v1/videos/${video_id}")"
  status="$(echo "${status_response}" | jq -r '.status')"

  case "${status}" in
    queued|in_progress)
      echo "Video job ${video_id} status: ${status}"
      sleep "${POLL_INTERVAL}"
      ;;
    completed)
      echo "${status_response}" | jq .
      break
      ;;
    failed)
      echo "Video generation failed:"
      echo "${status_response}" | jq .
      exit 1
      ;;
    *)
      echo "Unexpected status response:"
      echo "${status_response}" | jq .
      exit 1
      ;;
  esac
done

curl -sS -L "${BASE_URL}/v1/videos/${video_id}/content" -o "${OUTPUT_PATH}"
echo "Saved video to ${OUTPUT_PATH}"

run_curl_image_to_video.sh

#!/bin/bash
# Wan2.2 image-to-video curl example using the async video job API.

set -euo pipefail

INPUT_IMAGE="${INPUT_IMAGE:-../../offline_inference/image_to_video/qwen-bear.png}"
BASE_URL="${BASE_URL:-http://localhost:8099}"
OUTPUT_PATH="${OUTPUT_PATH:-wan22_i2v_output.mp4}"
NEGATIVE_PROMPT="${NEGATIVE_PROMPT:-}"
SAMPLE_SOLVER="${SAMPLE_SOLVER:-}"
POLL_INTERVAL="${POLL_INTERVAL:-2}"

if [ ! -f "$INPUT_IMAGE" ]; then
    echo "Input image not found: $INPUT_IMAGE"
    exit 1
fi

create_cmd=(
  curl -sS -X POST "${BASE_URL}/v1/videos"
  -H "Accept: application/json"
  -F "prompt=A bear playing with yarn, smooth motion"
  -F "input_reference=@${INPUT_IMAGE}"
  -F "seconds=2"
  -F "size=832x480"
  -F "fps=16"
  -F "num_inference_steps=40"
  -F "guidance_scale=1.0"
  -F "guidance_scale_2=1.0"
  -F "boundary_ratio=0.875"
  -F "flow_shift=12.0"
  -F "seed=42"
)

if [ -n "${NEGATIVE_PROMPT}" ]; then
  create_cmd+=(-F "negative_prompt=${NEGATIVE_PROMPT}")
fi

if [ -n "${SAMPLE_SOLVER}" ]; then
  create_cmd+=(-F "extra_params={\"sample_solver\":\"${SAMPLE_SOLVER}\"}")
fi

create_response="$("${create_cmd[@]}")"
video_id="$(echo "${create_response}" | jq -r '.id')"
if [ -z "${video_id}" ] || [ "${video_id}" = "null" ]; then
  echo "Failed to create video job:"
  echo "${create_response}" | jq .
  exit 1
fi

echo "Created video job ${video_id}"
echo "${create_response}" | jq .

while true; do
  status_response="$(curl -sS "${BASE_URL}/v1/videos/${video_id}")"
  status="$(echo "${status_response}" | jq -r '.status')"

  case "${status}" in
    queued|in_progress)
      echo "Video job ${video_id} status: ${status}"
      sleep "${POLL_INTERVAL}"
      ;;
    completed)
      echo "${status_response}" | jq .
      break
      ;;
    failed)
      echo "Video generation failed:"
      echo "${status_response}" | jq .
      exit 1
      ;;
    *)
      echo "Unexpected status response:"
      echo "${status_response}" | jq .
      exit 1
      ;;
  esac
done

curl -sS -L "${BASE_URL}/v1/videos/${video_id}/content" -o "${OUTPUT_PATH}"
echo "Saved video to ${OUTPUT_PATH}"

run_server.sh

#!/bin/bash
# Wan2.2 image-to-video server start script

MODEL="${MODEL:-Wan-AI/Wan2.2-I2V-A14B-Diffusers}"
PORT="${PORT:-8099}"
CACHE_BACKEND="${CACHE_BACKEND:-none}"
ENABLE_CACHE_DIT_SUMMARY="${ENABLE_CACHE_DIT_SUMMARY:-0}"

echo "Starting Wan2.2 I2V server..."
echo "Model: $MODEL"
echo "Port: $PORT"
echo "Cache backend: $CACHE_BACKEND"
if [ "$ENABLE_CACHE_DIT_SUMMARY" != "0" ]; then
    echo "Cache-DiT summary: enabled"
fi

CACHE_BACKEND_FLAG=""
if [ "$CACHE_BACKEND" != "none" ]; then
    CACHE_BACKEND_FLAG="--cache-backend $CACHE_BACKEND"
fi

vllm serve "$MODEL" --omni \
    --port "$PORT" \
    $CACHE_BACKEND_FLAG \
    $(if [ "$ENABLE_CACHE_DIT_SUMMARY" != "0" ]; then echo "--enable-cache-dit-summary"; fi)

run_server_hunyuan_video_15.sh

#!/bin/bash
# HunyuanVideo-1.5 image-to-video online serving startup script
#
# 480p: ~35 GB VRAM (BF16), fits 1x A100 80GB
# 720p: needs FP8 + VAE tiling, ~35 GB VRAM

MODEL="${MODEL:-hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v}"
PORT="${PORT:-8099}"
FLOW_SHIFT="${FLOW_SHIFT:-5.0}"
QUANTIZATION="${QUANTIZATION:-}"
CACHE_BACKEND="${CACHE_BACKEND:-none}"

echo "Starting HunyuanVideo-1.5 I2V server..."
echo "Model: $MODEL"
echo "Port: $PORT"
echo "Flow shift: $FLOW_SHIFT"
echo "Quantization: ${QUANTIZATION:-none}"
echo "Cache backend: $CACHE_BACKEND"

EXTRA_FLAGS=""
if [ -n "$QUANTIZATION" ]; then
    EXTRA_FLAGS="$EXTRA_FLAGS --quantization $QUANTIZATION"
fi
if [ "$CACHE_BACKEND" != "none" ]; then
    EXTRA_FLAGS="$EXTRA_FLAGS --cache-backend $CACHE_BACKEND"
fi

vllm serve "$MODEL" --omni \
    --port "$PORT" \
    --flow-shift "$FLOW_SHIFT" \
    $EXTRA_FLAGS