Skip to content

vllm_omni.reasoning.step_audio_reasoning_parser

logger module-attribute

logger = init_logger(__name__)

StepAudioReasoningParser

Bases: ReasoningParser

Reasoning parser for Step-Audio models.

Step-Audio supports two representations of thinking markers:

  1. Special tokens: <|THINK_START|> and <|THINK_END|> (single-token IDs, e.g. 151669 and 151670).

  2. Text markers: `` and `````` (multi-token sequences, e.g.

THINK_END_SPECIAL class-attribute instance-attribute

THINK_END_SPECIAL = '<|THINK_END|>'

THINK_END_TEXT class-attribute instance-attribute

THINK_END_TEXT = '</think>'

THINK_START_SPECIAL class-attribute instance-attribute

THINK_START_SPECIAL = '<|THINK_START|>'

THINK_START_TEXT class-attribute instance-attribute

THINK_START_TEXT = '<think>'

think_end_special_id instance-attribute

think_end_special_id: int = get(THINK_END_SPECIAL, -1)

think_end_text_id instance-attribute

think_end_text_id: int = get(THINK_END_TEXT, -1)

think_end_token instance-attribute

think_end_token = THINK_END_TEXT

think_end_token_id instance-attribute

think_end_token_id: int = (
    think_end_special_id
    if think_end_special_id != -1
    else think_end_text_id
)

think_start_special_id instance-attribute

think_start_special_id: int = get(THINK_START_SPECIAL, -1)

think_start_text_id instance-attribute

think_start_text_id: int = get(THINK_START_TEXT, -1)

think_start_token instance-attribute

think_start_token = THINK_START_TEXT

think_start_token_id instance-attribute

think_start_token_id: int = (
    think_start_special_id
    if think_start_special_id != -1
    else think_start_text_id
)

count_reasoning_tokens

count_reasoning_tokens(token_ids: Sequence[int]) -> int

Count tokens within thinking spans.

extract_content_ids

extract_content_ids(input_ids: list[int]) -> list[int]

extract_reasoning

extract_reasoning(
    model_output: str,
    request: ChatCompletionRequest | ResponsesRequest,
) -> tuple[str | None, str | None]

extract_reasoning_streaming

extract_reasoning_streaming(
    previous_text: str,
    current_text: str,
    delta_text: str,
    previous_token_ids: Sequence[int],
    current_token_ids: Sequence[int],
    delta_token_ids: Sequence[int],
) -> DeltaMessage | None

is_reasoning_end

is_reasoning_end(input_ids: Sequence[int]) -> bool

Check if reasoning has ended in the given token sequence.

When called with prompt token IDs (by the serving layer), the prompt may contain think markers from previous assistant turns. In multi-turn conversations the prompt can include both start and end markers, with the last marker being a start marker (from the generation prompt). In that case reasoning is NOT ended — the model is about to generate inside a new think block.

We therefore find the last think marker (start or end) in the decoded text and only return True if it is an end marker.

is_reasoning_end_streaming

is_reasoning_end_streaming(
    input_ids: Sequence[int], delta_ids: Iterable[int]
) -> bool