vllm_omni.metrics.modality ¶

OmniModalityMetrics — per-modality Prometheus families (audio path only).

7 audio business-semantic metric families. Text-path metrics (TTFT / ITL / TPOT / e2e) are NOT here — they come from the upstream vllm:*{stage="thinker", ...} families exposed via the OmniPrometheusStatLogger wrap.

Contents: - Audio family declarations (Histograms + Counters) - OmniModalityMetrics: label-bound observe API for the audio family - observe_modality_at_finalize: dispatcher called from omni_base's e2e finalize hook; currently handles the audio path only. - observe_audio_first_packet: TTFP emit from the streaming SSE first audio packet. - observe_audio_streaming_finalize: emits audio_underrun_s + audio_continuity_ok_total at SSE close using accumulated per-chunk arrival timestamps. - _extract_mm_output / _count_audio_frames: shape-tolerant helpers for the heterogeneous multimodal_output payloads emitted by different audio pipelines.

OmniModalityMetrics ¶

Per-modality observe API. Stage/replica are passed at observe time because a single OmniModalityMetrics instance per pipeline serves all stage+replica combinations.

inc_audio_continuity_ok ¶

inc_audio_continuity_ok(
    stage: str, replica: str, threshold_ms: int
) -> None

inc_audio_frames ¶

inc_audio_frames(
    stage: str, replica: str, n_frames: int
) -> None

inc_audio_skipped ¶

inc_audio_skipped(
    stage: str, replica: str, reason: str
) -> None

observe_audio_duration ¶

observe_audio_duration(
    stage: str, replica: str, duration_seconds: float
) -> None

observe_audio_rtf ¶

observe_audio_rtf(
    stage: str, replica: str, rtf: float
) -> None

observe_audio_ttfp ¶

observe_audio_ttfp(
    stage: str, replica: str, ttfp_seconds: float
) -> None

observe_audio_underrun ¶

observe_audio_underrun(
    stage: str, replica: str, underrun_s: float
) -> None

observe_audio_first_packet ¶

observe_audio_first_packet(
    mod_metrics: OmniModalityMetrics,
    *,
    stage_id: int,
    replica_id: int | None,
    arrival_ts: float,
    now_ts: float,
) -> None

Observe audio_ttfp_s on a request's first audio packet.

Caller is responsible for the once-per-request guard (e.g. checking ClientRequestState.first_audio_ts is None) so this function fires at most once per request_id. Defensive-skips when replica_id or arrival_ts is insufficient — both can legitimately be missing in error paths and we'd rather drop the sample than emit a wrong (stage, replica).

observe_audio_streaming_finalize ¶

observe_audio_streaming_finalize(
    mod_metrics: OmniModalityMetrics,
    *,
    stage_id: int,
    replica_id: int | None,
    chunk_arrival_times_s: list[float],
    chunk_bytes: list[int],
    sample_rate: int,
    threshold_s: float = AUDIO_CONTINUITY_DEFAULT_THRESHOLD_S,
) -> None

Emit audio_underrun_s + audio_continuity_ok_total at request end.

Reuses the math from vllm_omni.benchmarks.audio_continuity so the server-side and bench-side definitions stay aligned. Caller is responsible for collecting per-chunk arrival timestamps and byte sizes during the streaming response.

observe_modality_at_finalize ¶

observe_modality_at_finalize(
    mod_metrics: OmniModalityMetrics,
    *,
    output_type: str | None,
    stage_id: int,
    replica_id: int | None,
    stage_metrics: Any,
    engine_outputs: Any,
) -> None

Route audio-path observations for a finalized request.

Used by omni_base._process_single_result inside the e2e_done finalize guard so it fires once per request. Skips text path (covered by upstream vllm:*{stage="thinker", ...}) and any case where required inputs are missing — caller should not need to pre-validate.

audio_ttfp is intentionally NOT observed here; it's emitted by the streaming hook at first-packet time, not at finalize.