Frame Interpolation¶

Overview¶

vLLM-Omni supports post-generation frame interpolation for supported video diffusion pipelines. This feature inserts synthesized intermediate frames between adjacent generated frames to improve temporal smoothness without rerunning the diffusion denoising loop.

Frame interpolation runs in the diffusion worker post-processing path instead of the API server encoding path. This allows the interpolation step to reuse the worker's current accelerator device and keeps the FastAPI event loop free from heavy synchronous PyTorch work.

For an input video with N generated frames and interpolation exponent exp, the output frame count is:

(N - 1) * 2**exp + 1

The output FPS is multiplied by 2**exp so the clip duration remains close to the original generated video.

Supported Pipelines¶

Frame interpolation is currently supported for:

WanPipeline (Wan2.2 text-to-video)
WanImageToVideoPipeline

Request Parameters¶

The video APIs /v1/videos and /v1/videos/sync accept:

Parameter	Type	Default	Description
`enable_frame_interpolation`	bool	`false`	Enable post-generation frame interpolation
`frame_interpolation_exp`	int	`1`	Interpolation exponent. `1=2x`, `2=4x`, etc.
`frame_interpolation_scale`	float	`1.0`	RIFE inference scale
`frame_interpolation_model_path`	str	`None`	Local directory or Hugging Face repo ID containing `flownet.pkl`

Execution Flow¶

For supported Wan2.2 pipelines, the execution order is:

Diffusion worker finishes denoising and decodes the raw video tensor.
Worker-side model-specific post-processing runs.
If frame interpolation is enabled, RIFE interpolates the decoded video tensor on the worker side and records a FPS multiplier in custom_output.
The API server receives the already-interpolated video and only performs MP4 export.

This design keeps interpolation close to the generated tensor and avoids introducing another heavyweight GPU context in the API server process.

Example¶

Start the server:

vllm serve Wan-AI/Wan2.2-T2V-A14B-Diffusers --omni --port 8091

Run a sync request with interpolation enabled:

curl -X POST http://localhost:8091/v1/videos/sync \
  -F "prompt=A dog running through a park" \
  -F "num_frames=81" \
  -F "width=832" \
  -F "height=480" \
  -F "fps=16" \
  -F "num_inference_steps=40" \
  -F "guidance_scale=1.0" \
  -F "guidance_scale_2=1.0" \
  -F "enable_frame_interpolation=true" \
  -F "frame_interpolation_exp=1" \
  -F "frame_interpolation_scale=1.0" \
  -F "seed=42" \
  -o sync_t2v_interpolated.mp4

Notes¶

This is a post-processing feature. It does not modify the diffusion denoising schedule.
Higher interpolation exponents increase post-processing time and memory usage.
If the interpolation model weights are not available locally, frame_interpolation_model_path may point to a Hugging Face repo containing flownet.pkl.