Skip to content

Image-To-Video

Source https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/image_to_video.

This example demonstrates how to deploy the Wan2.2 image-to-video model for online video generation using vLLM-Omni.

Start Server

Basic Start

vllm serve Wan-AI/Wan2.2-I2V-A14B-Diffusers --omni --port 8091

Start with Parameters

Or use the startup script:

bash run_server.sh

The script allows overriding: - MODEL (default: Wan-AI/Wan2.2-I2V-A14B-Diffusers) - PORT (default: 8091) - BOUNDARY_RATIO (default: 0.875) - FLOW_SHIFT (default: 12.0) - CACHE_BACKEND (default: none) - ENABLE_CACHE_DIT_SUMMARY (default: 0)

Ascend / Local LightX2V Example

For a local Wan2.2-LightX2V Diffusers directory on Ascend/NPU, you can start the server like this:

vllm serve /path/to/Wan2.2-I2V-A14B-LightX2V-Diffusers-Lightning \
  --omni \
  --port 8091 \
  --flow-shift 12 \
  --cfg-parallel-size 1 \
  --ulysses-degree 4 \
  --use-hsdp \
  --trust-remote-code \
  --allowed-local-media-path / \
  --seed 42

Async Job Behavior

POST /v1/videos is asynchronous. It creates a video job and immediately returns metadata like the job ID and initial queued status. To get the final artifact, poll the job status and then download the completed file from the content endpoint.

The main endpoints are: - POST /v1/videos: create a video generation job (async) - POST /v1/videos/sync: generate a video and return raw bytes (sync, for benchmarks) - GET /v1/videos/{video_id}: retrieve the current job status and metadata - GET /v1/videos: list stored video jobs - GET /v1/videos/{video_id}/content: download the generated video file - DELETE /v1/videos/{video_id}: delete the job and any stored output

Sync API (Benchmark / Testing)

POST /v1/videos/sync is a synchronous alternative that blocks until generation completes and returns the raw video bytes (video/mp4) directly in the response body. It is designed for benchmark and testing scenarios where one-shot request/response latency measurement is needed.

The sync endpoint accepts the same form parameters as POST /v1/videos. It does not create any stored job record — the response is purely the generated video file. Metadata is returned via response headers:

  • X-Request-Id: unique identifier for this generation request
  • X-Model: model name used for generation
  • X-Inference-Time-S: wall-clock inference time in seconds
curl -X POST http://localhost:8091/v1/videos/sync \
  -F "prompt=A bear playing with yarn, smooth motion" \
  -F "input_reference=@/path/to/input.png" \
  -F "size=832x480" \
  -F "num_frames=33" \
  -F "fps=16" \
  -F "negative_prompt=low quality, blurry, static" \
  -F "num_inference_steps=40" \
  -F "guidance_scale=1.0" \
  -F "guidance_scale_2=1.0" \
  -F "boundary_ratio=0.875" \
  -F "flow_shift=12.0" \
  -F 'extra_params={"sample_solver":"euler"}' \
  -F "seed=42" \
  -o sync_i2v_output.mp4

For Wan Lightning/Distill checkpoints, pass {"sample_solver":"euler"} via extra_params. The default solver is unipc.

Example matching the local LightX2V deployment above:

curl -sS -X POST http://localhost:8091/v1/videos/sync \
  -H "Accept: video/mp4" \
  -F "prompt=A cat playing with yarn" \
  -F "input_reference=@/path/to/input.jpg" \
  -F "width=832" \
  -F "height=480" \
  -F "num_frames=81" \
  -F "fps=16" \
  -F "num_inference_steps=4" \
  -F "guidance_scale=1.0" \
  -F "guidance_scale_2=1.0" \
  -F "boundary_ratio=0.875" \
  -F "seed=42" \
  -F 'extra_params={"sample_solver":"euler"}' \
  -o ./output.mp4

Use /v1/videos/sync if you want to write the MP4 directly to a file. POST /v1/videos is async and returns job metadata, not inline b64_json.

Storage

Generated video files are stored on local disk by the async video API. Local file storage behavior can be controlled via the following environment variables:

  • VLLM_OMNI_STORAGE_PATH: directory used for generated files (default: /tmp/storage)
  • VLLM_OMNI_STORAGE_MAX_CONCURRENCY: max concurrent save/delete operations (default: 4)

Example:

export VLLM_OMNI_STORAGE_PATH=/var/tmp/vllm-omni-videos
export VLLM_OMNI_STORAGE_MAX_CONCURRENCY=8

API Calls

Method 1: Using curl

# Basic image-to-video generation
bash run_curl_image_to_video.sh

# Wan Lightning/Distill checkpoints
SAMPLE_SOLVER=euler bash run_curl_image_to_video.sh

# Or execute directly (OpenAI-style multipart)
create_response=$(curl -s http://localhost:8091/v1/videos \
  -H "Accept: application/json" \
  -F "prompt=A bear playing with yarn, smooth motion" \
  -F "negative_prompt=low quality, blurry, static" \
  -F "input_reference=@/path/to/qwen-bear.png" \
  -F "width=832" \
  -F "height=480" \
  -F "num_frames=33" \
  -F "fps=16" \
  -F "num_inference_steps=40" \
  -F "guidance_scale=1.0" \
  -F "guidance_scale_2=1.0" \
  -F "boundary_ratio=0.875" \
  -F "flow_shift=12.0" \
  -F 'extra_params={"sample_solver":"euler"}' \
  -F "seed=42")

video_id=$(echo "$create_response" | jq -r '.id')
while true; do
  status=$(curl -s "http://localhost:8091/v1/videos/${video_id}" | jq -r '.status')
  if [ "$status" = "completed" ]; then
    break
  fi
  if [ "$status" = "failed" ]; then
    echo "Video generation failed"
    exit 1
  fi
  sleep 2
done

curl -s "http://localhost:8091/v1/videos/${video_id}" | jq .
curl -L "http://localhost:8091/v1/videos/${video_id}/content" -o wan22_i2v_output.mp4

Request Format

Required Fields

curl -X POST http://localhost:8091/v1/videos \
  -F "prompt=A bear playing with yarn, smooth motion" \
  -F "negative_prompt=low quality, blurry, static" \
  -F "input_reference=@/path/to/qwen-bear.png"

Alternative JSON-Safe Reference Input

Use image_reference when you want to pass a URL or JSON-safe image reference instead of uploading a file. Do not send input_reference and image_reference together.

curl -X POST http://localhost:8091/v1/videos \
  -F "prompt=A bear playing with yarn, smooth motion" \
  -F 'image_reference={"image_url":"https://example.com/qwen-bear.png"}'

Generation with Parameters

curl -X POST http://localhost:8091/v1/videos \
  -F "prompt=A bear playing with yarn, smooth motion" \
  -F "negative_prompt=low quality, blurry, static" \
  -F "input_reference=@/path/to/qwen-bear.png" \
  -F "width=832" \
  -F "height=480" \
  -F "num_frames=33" \
  -F "fps=16" \
  -F "num_inference_steps=40" \
  -F "guidance_scale=1.0" \
  -F "guidance_scale_2=1.0" \
  -F "boundary_ratio=0.875" \
  -F "flow_shift=12.0" \
  -F 'extra_params={"sample_solver":"euler"}' \
  -F "seed=42"

sample_solver is supported by Wan2.2 online serving through the existing extra_params field, which is merged into the pipeline extra_args. Use unipc for the default multistep solver, or euler for Lightning/Distill checkpoints.

Create Response Format

POST /v1/videos returns a job record, not inline base64 video data.

{
  "id": "video_gen_123",
  "object": "video",
  "status": "queued",
  "model": "Wan-AI/Wan2.2-I2V-A14B-Diffusers",
  "prompt": "A bear playing with yarn, smooth motion",
  "created_at": 1234567890
}

Retrieve, List, Download, and Delete

Retrieve a job

curl -s http://localhost:8091/v1/videos/${video_id} | jq .

List jobs

curl -s http://localhost:8091/v1/videos | jq .

Download the completed video

curl -L http://localhost:8091/v1/videos/${video_id}/content -o wan22_i2v_output.mp4

Delete a job and its stored file

curl -X DELETE http://localhost:8091/v1/videos/${video_id} | jq .

Poll Until Complete

while true; do
  status=$(curl -s http://localhost:8091/v1/videos/${video_id} | jq -r '.status')
  if [ "$status" = "completed" ]; then
    break
  fi
  if [ "$status" = "failed" ]; then
    echo "Video generation failed"
    exit 1
  fi
  sleep 2
done

Example materials

run_curl_hunyuan_video_15.sh
#!/bin/bash
# HunyuanVideo-1.5 image-to-video curl example using the async video job API.

set -euo pipefail

INPUT_IMAGE="${INPUT_IMAGE:-test_input.jpg}"
BASE_URL="${BASE_URL:-http://localhost:8099}"
OUTPUT_PATH="${OUTPUT_PATH:-hunyuan_video_15_i2v.mp4}"
POLL_INTERVAL="${POLL_INTERVAL:-2}"

if [ ! -f "$INPUT_IMAGE" ]; then
    echo "Input image not found: $INPUT_IMAGE"
    echo "Provide an image via INPUT_IMAGE env var."
    exit 1
fi

create_response=$(
  curl -sS -X POST "${BASE_URL}/v1/videos" \
    -H "Accept: application/json" \
    -F "prompt=The camera follows the puppy as it runs forward on the grass, its four legs alternating steps, its tail held high and wagging side to side." \
    -F "input_reference=@${INPUT_IMAGE}" \
    -F "size=832x480" \
    -F "num_frames=33" \
    -F "fps=24" \
    -F "num_inference_steps=30" \
    -F "guidance_scale=6.0" \
    -F "flow_shift=5.0" \
    -F "seed=42"
)

video_id="$(echo "${create_response}" | jq -r '.id')"
if [ -z "${video_id}" ] || [ "${video_id}" = "null" ]; then
  echo "Failed to create video job:"
  echo "${create_response}" | jq .
  exit 1
fi

echo "Created video job ${video_id}"
echo "${create_response}" | jq .

while true; do
  status_response="$(curl -sS "${BASE_URL}/v1/videos/${video_id}")"
  status="$(echo "${status_response}" | jq -r '.status')"

  case "${status}" in
    queued|in_progress)
      echo "Video job ${video_id} status: ${status}"
      sleep "${POLL_INTERVAL}"
      ;;
    completed)
      echo "${status_response}" | jq .
      break
      ;;
    failed)
      echo "Video generation failed:"
      echo "${status_response}" | jq .
      exit 1
      ;;
    *)
      echo "Unexpected status response:"
      echo "${status_response}" | jq .
      exit 1
      ;;
  esac
done

curl -sS -L "${BASE_URL}/v1/videos/${video_id}/content" -o "${OUTPUT_PATH}"
echo "Saved video to ${OUTPUT_PATH}"
run_curl_image_to_video.sh
#!/bin/bash
# Wan2.2 image-to-video curl example using the async video job API.

set -euo pipefail

INPUT_IMAGE="${INPUT_IMAGE:-../../offline_inference/image_to_video/qwen-bear.png}"
BASE_URL="${BASE_URL:-http://localhost:8099}"
OUTPUT_PATH="${OUTPUT_PATH:-wan22_i2v_output.mp4}"
NEGATIVE_PROMPT="${NEGATIVE_PROMPT:-}"
SAMPLE_SOLVER="${SAMPLE_SOLVER:-}"
POLL_INTERVAL="${POLL_INTERVAL:-2}"

if [ ! -f "$INPUT_IMAGE" ]; then
    echo "Input image not found: $INPUT_IMAGE"
    exit 1
fi

create_cmd=(
  curl -sS -X POST "${BASE_URL}/v1/videos"
  -H "Accept: application/json"
  -F "prompt=A bear playing with yarn, smooth motion"
  -F "input_reference=@${INPUT_IMAGE}"
  -F "seconds=2"
  -F "size=832x480"
  -F "fps=16"
  -F "num_inference_steps=40"
  -F "guidance_scale=1.0"
  -F "guidance_scale_2=1.0"
  -F "boundary_ratio=0.875"
  -F "flow_shift=12.0"
  -F "seed=42"
)

if [ -n "${NEGATIVE_PROMPT}" ]; then
  create_cmd+=(-F "negative_prompt=${NEGATIVE_PROMPT}")
fi

if [ -n "${SAMPLE_SOLVER}" ]; then
  create_cmd+=(-F "extra_params={\"sample_solver\":\"${SAMPLE_SOLVER}\"}")
fi

create_response="$("${create_cmd[@]}")"
video_id="$(echo "${create_response}" | jq -r '.id')"
if [ -z "${video_id}" ] || [ "${video_id}" = "null" ]; then
  echo "Failed to create video job:"
  echo "${create_response}" | jq .
  exit 1
fi

echo "Created video job ${video_id}"
echo "${create_response}" | jq .

while true; do
  status_response="$(curl -sS "${BASE_URL}/v1/videos/${video_id}")"
  status="$(echo "${status_response}" | jq -r '.status')"

  case "${status}" in
    queued|in_progress)
      echo "Video job ${video_id} status: ${status}"
      sleep "${POLL_INTERVAL}"
      ;;
    completed)
      echo "${status_response}" | jq .
      break
      ;;
    failed)
      echo "Video generation failed:"
      echo "${status_response}" | jq .
      exit 1
      ;;
    *)
      echo "Unexpected status response:"
      echo "${status_response}" | jq .
      exit 1
      ;;
  esac
done

curl -sS -L "${BASE_URL}/v1/videos/${video_id}/content" -o "${OUTPUT_PATH}"
echo "Saved video to ${OUTPUT_PATH}"
run_server.sh
#!/bin/bash
# Wan2.2 image-to-video server start script

MODEL="${MODEL:-Wan-AI/Wan2.2-I2V-A14B-Diffusers}"
PORT="${PORT:-8099}"
CACHE_BACKEND="${CACHE_BACKEND:-none}"
ENABLE_CACHE_DIT_SUMMARY="${ENABLE_CACHE_DIT_SUMMARY:-0}"

echo "Starting Wan2.2 I2V server..."
echo "Model: $MODEL"
echo "Port: $PORT"
echo "Cache backend: $CACHE_BACKEND"
if [ "$ENABLE_CACHE_DIT_SUMMARY" != "0" ]; then
    echo "Cache-DiT summary: enabled"
fi

CACHE_BACKEND_FLAG=""
if [ "$CACHE_BACKEND" != "none" ]; then
    CACHE_BACKEND_FLAG="--cache-backend $CACHE_BACKEND"
fi

vllm serve "$MODEL" --omni \
    --port "$PORT" \
    $CACHE_BACKEND_FLAG \
    $(if [ "$ENABLE_CACHE_DIT_SUMMARY" != "0" ]; then echo "--enable-cache-dit-summary"; fi)
run_server_hunyuan_video_15.sh
#!/bin/bash
# HunyuanVideo-1.5 image-to-video online serving startup script
#
# 480p: ~35 GB VRAM (BF16), fits 1x A100 80GB
# 720p: needs FP8 + VAE tiling, ~35 GB VRAM

MODEL="${MODEL:-hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v}"
PORT="${PORT:-8099}"
FLOW_SHIFT="${FLOW_SHIFT:-5.0}"
QUANTIZATION="${QUANTIZATION:-}"
CACHE_BACKEND="${CACHE_BACKEND:-none}"

echo "Starting HunyuanVideo-1.5 I2V server..."
echo "Model: $MODEL"
echo "Port: $PORT"
echo "Flow shift: $FLOW_SHIFT"
echo "Quantization: ${QUANTIZATION:-none}"
echo "Cache backend: $CACHE_BACKEND"

EXTRA_FLAGS=""
if [ -n "$QUANTIZATION" ]; then
    EXTRA_FLAGS="$EXTRA_FLAGS --quantization $QUANTIZATION"
fi
if [ "$CACHE_BACKEND" != "none" ]; then
    EXTRA_FLAGS="$EXTRA_FLAGS --cache-backend $CACHE_BACKEND"
fi

vllm serve "$MODEL" --omni \
    --port "$PORT" \
    --flow-shift "$FLOW_SHIFT" \
    $EXTRA_FLAGS