Text-To-Audio¶
Source https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_audio.
The stabilityai/stable-audio-open-1.0 pipeline generates audio from text prompts.
Prerequisites¶
If you use a gated model (e.g., stabilityai/stable-audio-open-1.0), ensure you have access:
- Accept Model License: Visit the model page on Hugging Face (e.g., [stabilityai/stable-audio-open-1.0]) and accept the user agreement.
- Authenticate: Log in to Hugging Face locally to access the gated model.
Local CLI Usage¶
python text_to_audio.py \
--model stabilityai/stable-audio-open-1.0 \
--prompt "The sound of a hammer hitting a wooden surface" \
--negative-prompt "Low quality" \
--seed 42 \
--guidance-scale 7.0 \
--audio-length 10.0 \
--num-inference-steps 100 \
--cache-backend tea_cache \
--output stable_audio_output.wav
To reduce per-GPU memory for multi-GPU inference, launch with HSDP:
python text_to_audio.py \
--model stabilityai/stable-audio-open-1.0 \
--prompt "The sound of a hammer hitting a wooden surface" \
--negative-prompt "Low quality" \
--seed 42 \
--guidance-scale 7.0 \
--audio-length 10.0 \
--num-inference-steps 100 \
--use-hsdp \
--hsdp-shard-size 2 \
--output stable_audio_output.wav
Key arguments:
--prompt: text description (string).--negative-prompt: negative prompt for classifier-free guidance.--seed: integer seed for deterministic generation.--guidance-scale: classifier-free guidance scale.--audio-length: audio duration in seconds.--num-inference-steps: diffusion sampling steps.(more steps = higher quality, slower).--use-hsdp: enable HSDP weight sharding for the Stable Audio DiT.--hsdp-shard-size: number of GPUs used for HSDP sharding.--hsdp-replicate-size: number of HSDP replica groups.--cache-backend: cache acceleration backend. Stable Audio currently supportstea_cache.--output: path to save the generated WAV file.
Example materials¶
text_to_audio.py
Large file omitted from the rendered docs. View it on GitHub: https://github.com/vllm-project/vllm-omni/blob/main/examples/offline_inference/text_to_audio/text_to_audio.py.