Dynin-Omni Offline End2End Example¶

Source https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/dynin_omni.

This folder contains a unified offline inference entrypoint:

end2end.py

1. Environment Setup¶

Run from repository root:

cd <REPO_ROOT>

If needed, install this repo in editable mode:

pip install -e .

2. Extra Dependencies (EMOVA)¶

Install the following packages for EMOVA-related components:

pip install \
  "phonemizer==3.3.0" \
  "Unidecode==1.4.0" \
  "hydra-core==1.3.2" \
  "pytorch-lightning==1.1.0" \
  "wget==3.2" \
  "wrapt==2.1.1" \
  "onnx==1.20.1" \
  "frozendict==2.4.7" \
  "inflect==7.5.0" \
  "braceexpand==0.1.7" \
  "webdataset==1.0.2" \
  "torch-stft==0.1.4" \
  "editdistance==0.8.1"

3. Hardware and VRAM Requirements¶

This example uses a 3-stage pipeline on one GPU by default (dynin_omni.yaml):

Stage-0 (token2text): gpu_memory_utilization: 0.5
Stage-1 (token2image): gpu_memory_utilization: 0.1
Stage-2 (token2audio): gpu_memory_utilization: 0.1

Requested GPU Memory Budget from `gpu_memory_utilization`¶

Stage	Utilization	A100 80GB	H200 141GB
Stage-0 (token2text)	0.5	~40.0 GB	~70.5 GB
Stage-1 (token2image)	0.1	~8.0 GB	~14.1 GB
Stage-2 (token2audio)	0.1	~8.0 GB	~14.1 GB
Total requested budget	0.7	~56.0 GB	~98.7 GB

Observed Runtime Signal (from your log)¶

Stage-0 reported: Model loading took 15.12 GiB memory (weights footprint signal).
Stages 1/2 can still add runtime memory depending on task path and backend allocations.
Keep extra headroom for CUDA/PyTorch overhead and temporary allocations.

GPU Compatibility¶

Confirmed target GPUs for this setup: NVIDIA H200, NVIDIA A100.
CI/e2e coverage in this repo also includes CUDA L4 markers for Dynin tests.

4. End2End Run Examples¶

# t2t
python <REPO_ROOT>/examples/offline_inference/dynin_omni/end2end.py \
  --task t2t --model snu-aidas/Dynin-Omni --text <INSTRUCTION_TEXT>

# i2t
python <REPO_ROOT>/examples/offline_inference/dynin_omni/end2end.py \
  --task i2t --model snu-aidas/Dynin-Omni --image <IMAGE_PATH> --text "Please describe this image in detail."

# s2t
python <REPO_ROOT>/examples/offline_inference/dynin_omni/end2end.py \
  --task s2t --model snu-aidas/Dynin-Omni --audio <AUDIO_PATH> --text "Transcribe the given audio."

# t2i
python <REPO_ROOT>/examples/offline_inference/dynin_omni/end2end.py \
  --task t2i --model snu-aidas/Dynin-Omni --text <INSTRUCTION_TEXT>

# v2t
python <REPO_ROOT>/examples/offline_inference/dynin_omni/end2end.py \
  --task v2t --model snu-aidas/Dynin-Omni --video <VIDEO_PATH> --text "Describe this video in detail."

# i2i
python <REPO_ROOT>/examples/offline_inference/dynin_omni/end2end.py \
  --task i2i --model snu-aidas/Dynin-Omni --image <IMAGE_PATH> --text <INSTRUCTION_TEXT>

# t2s
python <REPO_ROOT>/examples/offline_inference/dynin_omni/end2end.py \
  --task t2s --model snu-aidas/Dynin-Omni --text <INSTRUCTION_TEXT>

5. Notes¶

Outputs are saved under task-specific directories in /tmp by default.
You can override output path with --output-dir.
If you want to force local config resolution, pass --dynin-config-path <PATH_TO_DYNIN_OMNI_YAML>.
If you see the warning max_num_batched_tokens (32768) exceeds max_num_seqs * max_model_len (4096), reduce max_num_batched_tokens in stage config (for example, 4096 in CI config).

Example materials¶

end2end.py

Large file omitted from the rendered docs. View it on GitHub: https://github.com/vllm-project/vllm-omni/blob/main/examples/offline_inference/dynin_omni/end2end.py.