Qwen2.5-Omni¶
Source https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni.
Setup¶
Please refer to the stage configuration documentation to configure memory allocation appropriately for your hardware setup.
Run examples¶
Multiple Prompts¶
Get into the example folder
Then run the command below. Note: for processing large volume data, it uses py_generator mode, which will return a python generator from Omni class.Single Prompt¶
Get into the example folder
Then run the command below.Modality control¶
If you want to control output modalities, e.g. only output text, you can run the command below:
Using Local Media Files¶
The end2end.py script supports local media files (audio, video, image) via CLI arguments:
# Use single local media files
python end2end.py --query-type use_image --image-path /path/to/image.jpg
python end2end.py --query-type use_video --video-path /path/to/video.mp4
python end2end.py --query-type use_audio --audio-path /path/to/audio.wav
# Combine multiple local media files
python end2end.py --query-type mixed_modalities \
--video-path /path/to/video.mp4 \
--image-path /path/to/image.jpg \
--audio-path /path/to/audio.wav
# Use audio from video file
python end2end.py --query-type use_audio_in_video --video-path /path/to/video.mp4
If media file paths are not provided, the script will use default assets. Supported query types: - use_image: Image input only - use_video: Video input only - use_audio: Audio input only - mixed_modalities: Audio + image + video - use_audio_in_video: Extract audio from video - text: Text-only query
Example materials¶
end2end.py
Large file omitted from the rendered docs. View it on GitHub: https://github.com/vllm-project/vllm-omni/blob/main/examples/offline_inference/qwen2_5_omni/end2end.py.
extract_prompts.py
#!/usr/bin/env python3
import argparse
def extract_prompt(line: str) -> str | None:
# Extract the content between the first '|' and the second '|'
i = line.find("|")
if i == -1:
return None
j = line.find("|", i + 1)
if j == -1:
return None
return line[i + 1 : j].strip()
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--input", "-i", required=True, help="Input .lst file path")
parser.add_argument("--output", "-o", required=True, help="Output file path")
parser.add_argument(
"--topk",
"-k",
type=int,
default=100,
help="Extract the top K prompts (default: 100)",
)
args = parser.parse_args()
prompts = []
with open(args.input, encoding="utf-8", errors="ignore") as f:
for line in f:
if len(prompts) >= args.topk:
break
p = extract_prompt(line.rstrip("\n"))
if p:
prompts.append(p)
with open(args.output, "w", encoding="utf-8") as f:
for p in prompts:
f.write(p + "\n")
if __name__ == "__main__":
main()