Image-To-Video¶
Source https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/image_to_video.
This example demonstrates how to deploy the Wan2.2 image-to-video model for online video generation using vLLM-Omni.
Start Server¶
Basic Start¶
Start with Parameters¶
Or use the startup script:
The script allows overriding: - MODEL (default: Wan-AI/Wan2.2-I2V-A14B-Diffusers) - PORT (default: 8091) - BOUNDARY_RATIO (default: 0.875) - FLOW_SHIFT (default: 12.0) - CACHE_BACKEND (default: none) - ENABLE_CACHE_DIT_SUMMARY (default: 0)
Ascend / Local LightX2V Example¶
For a local Wan2.2-LightX2V Diffusers directory on Ascend/NPU, you can start the server like this:
vllm serve /path/to/Wan2.2-I2V-A14B-LightX2V-Diffusers-Lightning \
--omni \
--port 8091 \
--flow-shift 12 \
--cfg-parallel-size 1 \
--ulysses-degree 4 \
--use-hsdp \
--trust-remote-code \
--allowed-local-media-path / \
--seed 42
Async Job Behavior¶
POST /v1/videos is asynchronous. It creates a video job and immediately returns metadata like the job ID and initial queued status. To get the final artifact, poll the job status and then download the completed file from the content endpoint.
The main endpoints are: - POST /v1/videos: create a video generation job (async) - POST /v1/videos/sync: generate a video and return raw bytes (sync, for benchmarks) - GET /v1/videos/{video_id}: retrieve the current job status and metadata - GET /v1/videos: list stored video jobs - GET /v1/videos/{video_id}/content: download the generated video file - DELETE /v1/videos/{video_id}: delete the job and any stored output
Sync API (Benchmark / Testing)¶
POST /v1/videos/sync is a synchronous alternative that blocks until generation completes and returns the raw video bytes (video/mp4) directly in the response body. It is designed for benchmark and testing scenarios where one-shot request/response latency measurement is needed.
The sync endpoint accepts the same form parameters as POST /v1/videos. It does not create any stored job record — the response is purely the generated video file. Metadata is returned via response headers:
X-Request-Id: unique identifier for this generation requestX-Model: model name used for generationX-Inference-Time-S: wall-clock inference time in seconds
curl -X POST http://localhost:8091/v1/videos/sync \
-F "prompt=A bear playing with yarn, smooth motion" \
-F "input_reference=@/path/to/input.png" \
-F "size=832x480" \
-F "num_frames=33" \
-F "fps=16" \
-F "negative_prompt=low quality, blurry, static" \
-F "num_inference_steps=40" \
-F "guidance_scale=1.0" \
-F "guidance_scale_2=1.0" \
-F "boundary_ratio=0.875" \
-F "flow_shift=12.0" \
-F 'extra_params={"sample_solver":"euler"}' \
-F "seed=42" \
-o sync_i2v_output.mp4
For Wan Lightning/Distill checkpoints, pass {"sample_solver":"euler"} via extra_params. The default solver is unipc.
Example matching the local LightX2V deployment above:
curl -sS -X POST http://localhost:8091/v1/videos/sync \
-H "Accept: video/mp4" \
-F "prompt=A cat playing with yarn" \
-F "input_reference=@/path/to/input.jpg" \
-F "width=832" \
-F "height=480" \
-F "num_frames=81" \
-F "fps=16" \
-F "num_inference_steps=4" \
-F "guidance_scale=1.0" \
-F "guidance_scale_2=1.0" \
-F "boundary_ratio=0.875" \
-F "seed=42" \
-F 'extra_params={"sample_solver":"euler"}' \
-o ./output.mp4
Use /v1/videos/sync if you want to write the MP4 directly to a file. POST /v1/videos is async and returns job metadata, not inline b64_json.
Storage¶
Generated video files are stored on local disk by the async video API. Local file storage behavior can be controlled via the following environment variables:
VLLM_OMNI_STORAGE_PATH: directory used for generated files (default:/tmp/storage)VLLM_OMNI_STORAGE_MAX_CONCURRENCY: max concurrent save/delete operations (default:4)
Example:
API Calls¶
Method 1: Using curl¶
# Basic image-to-video generation
bash run_curl_image_to_video.sh
# Wan Lightning/Distill checkpoints
SAMPLE_SOLVER=euler bash run_curl_image_to_video.sh
# Or execute directly (OpenAI-style multipart)
create_response=$(curl -s http://localhost:8091/v1/videos \
-H "Accept: application/json" \
-F "prompt=A bear playing with yarn, smooth motion" \
-F "negative_prompt=low quality, blurry, static" \
-F "input_reference=@/path/to/qwen-bear.png" \
-F "width=832" \
-F "height=480" \
-F "num_frames=33" \
-F "fps=16" \
-F "num_inference_steps=40" \
-F "guidance_scale=1.0" \
-F "guidance_scale_2=1.0" \
-F "boundary_ratio=0.875" \
-F "flow_shift=12.0" \
-F 'extra_params={"sample_solver":"euler"}' \
-F "seed=42")
video_id=$(echo "$create_response" | jq -r '.id')
while true; do
status=$(curl -s "http://localhost:8091/v1/videos/${video_id}" | jq -r '.status')
if [ "$status" = "completed" ]; then
break
fi
if [ "$status" = "failed" ]; then
echo "Video generation failed"
exit 1
fi
sleep 2
done
curl -s "http://localhost:8091/v1/videos/${video_id}" | jq .
curl -L "http://localhost:8091/v1/videos/${video_id}/content" -o wan22_i2v_output.mp4
Request Format¶
Required Fields¶
curl -X POST http://localhost:8091/v1/videos \
-F "prompt=A bear playing with yarn, smooth motion" \
-F "negative_prompt=low quality, blurry, static" \
-F "input_reference=@/path/to/qwen-bear.png"
Alternative JSON-Safe Reference Input¶
Use image_reference when you want to pass a URL or JSON-safe image reference instead of uploading a file. Do not send input_reference and image_reference together.
curl -X POST http://localhost:8091/v1/videos \
-F "prompt=A bear playing with yarn, smooth motion" \
-F 'image_reference={"image_url":"https://example.com/qwen-bear.png"}'
Generation with Parameters¶
curl -X POST http://localhost:8091/v1/videos \
-F "prompt=A bear playing with yarn, smooth motion" \
-F "negative_prompt=low quality, blurry, static" \
-F "input_reference=@/path/to/qwen-bear.png" \
-F "width=832" \
-F "height=480" \
-F "num_frames=33" \
-F "fps=16" \
-F "num_inference_steps=40" \
-F "guidance_scale=1.0" \
-F "guidance_scale_2=1.0" \
-F "boundary_ratio=0.875" \
-F "flow_shift=12.0" \
-F 'extra_params={"sample_solver":"euler"}' \
-F "seed=42"
sample_solver is supported by Wan2.2 online serving through the existing extra_params field, which is merged into the pipeline extra_args. Use unipc for the default multistep solver, or euler for Lightning/Distill checkpoints.
Create Response Format¶
POST /v1/videos returns a job record, not inline base64 video data.
{
"id": "video_gen_123",
"object": "video",
"status": "queued",
"model": "Wan-AI/Wan2.2-I2V-A14B-Diffusers",
"prompt": "A bear playing with yarn, smooth motion",
"created_at": 1234567890
}
Retrieve, List, Download, and Delete¶
Retrieve a job¶
List jobs¶
Download the completed video¶
Delete a job and its stored file¶
Poll Until Complete¶
while true; do
status=$(curl -s http://localhost:8091/v1/videos/${video_id} | jq -r '.status')
if [ "$status" = "completed" ]; then
break
fi
if [ "$status" = "failed" ]; then
echo "Video generation failed"
exit 1
fi
sleep 2
done
Example materials¶
run_curl_hunyuan_video_15.sh
#!/bin/bash
# HunyuanVideo-1.5 image-to-video curl example using the async video job API.
set -euo pipefail
INPUT_IMAGE="${INPUT_IMAGE:-test_input.jpg}"
BASE_URL="${BASE_URL:-http://localhost:8099}"
OUTPUT_PATH="${OUTPUT_PATH:-hunyuan_video_15_i2v.mp4}"
POLL_INTERVAL="${POLL_INTERVAL:-2}"
if [ ! -f "$INPUT_IMAGE" ]; then
echo "Input image not found: $INPUT_IMAGE"
echo "Provide an image via INPUT_IMAGE env var."
exit 1
fi
create_response=$(
curl -sS -X POST "${BASE_URL}/v1/videos" \
-H "Accept: application/json" \
-F "prompt=The camera follows the puppy as it runs forward on the grass, its four legs alternating steps, its tail held high and wagging side to side." \
-F "input_reference=@${INPUT_IMAGE}" \
-F "size=832x480" \
-F "num_frames=33" \
-F "fps=24" \
-F "num_inference_steps=30" \
-F "guidance_scale=6.0" \
-F "flow_shift=5.0" \
-F "seed=42"
)
video_id="$(echo "${create_response}" | jq -r '.id')"
if [ -z "${video_id}" ] || [ "${video_id}" = "null" ]; then
echo "Failed to create video job:"
echo "${create_response}" | jq .
exit 1
fi
echo "Created video job ${video_id}"
echo "${create_response}" | jq .
while true; do
status_response="$(curl -sS "${BASE_URL}/v1/videos/${video_id}")"
status="$(echo "${status_response}" | jq -r '.status')"
case "${status}" in
queued|in_progress)
echo "Video job ${video_id} status: ${status}"
sleep "${POLL_INTERVAL}"
;;
completed)
echo "${status_response}" | jq .
break
;;
failed)
echo "Video generation failed:"
echo "${status_response}" | jq .
exit 1
;;
*)
echo "Unexpected status response:"
echo "${status_response}" | jq .
exit 1
;;
esac
done
curl -sS -L "${BASE_URL}/v1/videos/${video_id}/content" -o "${OUTPUT_PATH}"
echo "Saved video to ${OUTPUT_PATH}"
run_curl_image_to_video.sh
#!/bin/bash
# Wan2.2 image-to-video curl example using the async video job API.
set -euo pipefail
INPUT_IMAGE="${INPUT_IMAGE:-../../offline_inference/image_to_video/qwen-bear.png}"
BASE_URL="${BASE_URL:-http://localhost:8099}"
OUTPUT_PATH="${OUTPUT_PATH:-wan22_i2v_output.mp4}"
NEGATIVE_PROMPT="${NEGATIVE_PROMPT:-}"
SAMPLE_SOLVER="${SAMPLE_SOLVER:-}"
POLL_INTERVAL="${POLL_INTERVAL:-2}"
if [ ! -f "$INPUT_IMAGE" ]; then
echo "Input image not found: $INPUT_IMAGE"
exit 1
fi
create_cmd=(
curl -sS -X POST "${BASE_URL}/v1/videos"
-H "Accept: application/json"
-F "prompt=A bear playing with yarn, smooth motion"
-F "input_reference=@${INPUT_IMAGE}"
-F "seconds=2"
-F "size=832x480"
-F "fps=16"
-F "num_inference_steps=40"
-F "guidance_scale=1.0"
-F "guidance_scale_2=1.0"
-F "boundary_ratio=0.875"
-F "flow_shift=12.0"
-F "seed=42"
)
if [ -n "${NEGATIVE_PROMPT}" ]; then
create_cmd+=(-F "negative_prompt=${NEGATIVE_PROMPT}")
fi
if [ -n "${SAMPLE_SOLVER}" ]; then
create_cmd+=(-F "extra_params={\"sample_solver\":\"${SAMPLE_SOLVER}\"}")
fi
create_response="$("${create_cmd[@]}")"
video_id="$(echo "${create_response}" | jq -r '.id')"
if [ -z "${video_id}" ] || [ "${video_id}" = "null" ]; then
echo "Failed to create video job:"
echo "${create_response}" | jq .
exit 1
fi
echo "Created video job ${video_id}"
echo "${create_response}" | jq .
while true; do
status_response="$(curl -sS "${BASE_URL}/v1/videos/${video_id}")"
status="$(echo "${status_response}" | jq -r '.status')"
case "${status}" in
queued|in_progress)
echo "Video job ${video_id} status: ${status}"
sleep "${POLL_INTERVAL}"
;;
completed)
echo "${status_response}" | jq .
break
;;
failed)
echo "Video generation failed:"
echo "${status_response}" | jq .
exit 1
;;
*)
echo "Unexpected status response:"
echo "${status_response}" | jq .
exit 1
;;
esac
done
curl -sS -L "${BASE_URL}/v1/videos/${video_id}/content" -o "${OUTPUT_PATH}"
echo "Saved video to ${OUTPUT_PATH}"
run_server.sh
#!/bin/bash
# Wan2.2 image-to-video server start script
MODEL="${MODEL:-Wan-AI/Wan2.2-I2V-A14B-Diffusers}"
PORT="${PORT:-8099}"
CACHE_BACKEND="${CACHE_BACKEND:-none}"
ENABLE_CACHE_DIT_SUMMARY="${ENABLE_CACHE_DIT_SUMMARY:-0}"
echo "Starting Wan2.2 I2V server..."
echo "Model: $MODEL"
echo "Port: $PORT"
echo "Cache backend: $CACHE_BACKEND"
if [ "$ENABLE_CACHE_DIT_SUMMARY" != "0" ]; then
echo "Cache-DiT summary: enabled"
fi
CACHE_BACKEND_FLAG=""
if [ "$CACHE_BACKEND" != "none" ]; then
CACHE_BACKEND_FLAG="--cache-backend $CACHE_BACKEND"
fi
vllm serve "$MODEL" --omni \
--port "$PORT" \
$CACHE_BACKEND_FLAG \
$(if [ "$ENABLE_CACHE_DIT_SUMMARY" != "0" ]; then echo "--enable-cache-dit-summary"; fi)
run_server_hunyuan_video_15.sh
#!/bin/bash
# HunyuanVideo-1.5 image-to-video online serving startup script
#
# 480p: ~35 GB VRAM (BF16), fits 1x A100 80GB
# 720p: needs FP8 + VAE tiling, ~35 GB VRAM
MODEL="${MODEL:-hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v}"
PORT="${PORT:-8099}"
FLOW_SHIFT="${FLOW_SHIFT:-5.0}"
QUANTIZATION="${QUANTIZATION:-}"
CACHE_BACKEND="${CACHE_BACKEND:-none}"
echo "Starting HunyuanVideo-1.5 I2V server..."
echo "Model: $MODEL"
echo "Port: $PORT"
echo "Flow shift: $FLOW_SHIFT"
echo "Quantization: ${QUANTIZATION:-none}"
echo "Cache backend: $CACHE_BACKEND"
EXTRA_FLAGS=""
if [ -n "$QUANTIZATION" ]; then
EXTRA_FLAGS="$EXTRA_FLAGS --quantization $QUANTIZATION"
fi
if [ "$CACHE_BACKEND" != "none" ]; then
EXTRA_FLAGS="$EXTRA_FLAGS --cache-backend $CACHE_BACKEND"
fi
vllm serve "$MODEL" --omni \
--port "$PORT" \
--flow-shift "$FLOW_SHIFT" \
$EXTRA_FLAGS