vllm_omni.model_executor.models.step_audio2 ¶
Modules:
| Name | Description |
|---|---|
step_audio2 | |
step_audio2_constants | Step-Audio2 configuration constants - Single Source of Truth. |
step_audio2_thinker | Step-Audio2 Thinker - Stage 1 LLM for Audio Understanding |
step_audio2_token2wav | |
StepAudio2ForConditionalGeneration ¶
Bases: Module, SupportsMultiModal, SupportsPP
Step-Audio2 Main Controller
Manages two-stage inference pipeline: - Stage 1 (Thinker): Audio understanding and token generation - Stage 2 (Token2Wav): Audio token to waveform synthesis
Usage
Stage 1: Thinker¶
model = StepAudio2ForConditionalGeneration( vllm_config=config, model_stage="thinker" )
Stage 2: Token2Wav¶
model = StepAudio2ForConditionalGeneration( vllm_config=config, model_stage="token2wav" )
make_empty_intermediate_tensors instance-attribute ¶
make_empty_intermediate_tensors = (
make_empty_intermediate_tensors
if model_stage == "thinker"
else (lambda: None)
)
model_stage instance-attribute ¶
model_stage = (
"thinker"
if raw_model_stage in ("thinker", "step_audio2_thinker")
else raw_model_stage
)
thinker instance-attribute ¶
thinker = init_vllm_registered_model(
vllm_config=vllm_config,
prefix=maybe_prefix(prefix, "thinker"),
hf_config=config,
architectures=[
"StepAudio2ThinkerForConditionalGeneration"
],
)
compute_logits ¶
Compute logits from hidden states
embed_input_ids ¶
Explicit vLLM embedding hook for both stages.
forward ¶
forward(
input_ids: Tensor,
positions: Tensor,
intermediate_tensors: IntermediateTensors | None = None,
inputs_embeds: Tensor | None = None,
**kwargs,
)
Forward pass through the model
For Thinker
Returns hidden states/logits
For Token2Wav: Returns waveform
get_input_embeddings ¶
Compatibility helper used by older call sites.
get_multimodal_embeddings ¶
Get multimodal embeddings - only used in Thinker stage.
get_placeholder_str classmethod ¶
Get placeholder string for a modality
Returns:
| Type | Description |
|---|---|
str | None | For audio: " |
move_submodules_to_devices ¶
move_submodules_to_devices(
*,
thinker_device: str | device | None = None,
token2wav_device: str | device | None = None,
) -> None
Optionally move thinker/token2wav to different devices
Example
model.move_submodules_to_devices( thinker_device='cuda:0', token2wav_device='cuda:1', )