vllm_omni.metrics.modality ¶
OmniModalityMetrics — per-modality Prometheus families (audio path only).
7 audio business-semantic metric families. Text-path metrics (TTFT / ITL / TPOT / e2e) are NOT here — they come from the upstream vllm:*{stage="thinker", ...} families exposed via the OmniPrometheusStatLogger wrap.
Contents: - Audio family declarations (Histograms + Counters) - OmniModalityMetrics: label-bound observe API for the audio family - observe_modality_at_finalize: dispatcher called from omni_base's e2e finalize hook; currently handles the audio path only. - observe_audio_first_packet: TTFP emit from the streaming SSE first audio packet. - observe_audio_streaming_finalize: emits audio_underrun_s + audio_continuity_ok_total at SSE close using accumulated per-chunk arrival timestamps. - _extract_mm_output / _count_audio_frames: shape-tolerant helpers for the heterogeneous multimodal_output payloads emitted by different audio pipelines.
OmniModalityMetrics ¶
observe_audio_first_packet ¶
observe_audio_first_packet(
mod_metrics: OmniModalityMetrics,
*,
stage_id: int,
replica_id: int | None,
arrival_ts: float,
now_ts: float,
) -> None
Observe audio_ttfp_s on a request's first audio packet.
Caller is responsible for the once-per-request guard (e.g. checking ClientRequestState.first_audio_ts is None) so this function fires at most once per request_id. Defensive-skips when replica_id or arrival_ts is insufficient — both can legitimately be missing in error paths and we'd rather drop the sample than emit a wrong (stage, replica).
observe_audio_streaming_finalize ¶
observe_audio_streaming_finalize(
mod_metrics: OmniModalityMetrics,
*,
stage_id: int,
replica_id: int | None,
chunk_arrival_times_s: list[float],
chunk_bytes: list[int],
sample_rate: int,
threshold_s: float = AUDIO_CONTINUITY_DEFAULT_THRESHOLD_S,
) -> None
Emit audio_underrun_s + audio_continuity_ok_total at request end.
Reuses the math from vllm_omni.benchmarks.audio_continuity so the server-side and bench-side definitions stay aligned. Caller is responsible for collecting per-chunk arrival timestamps and byte sizes during the streaming response.
observe_modality_at_finalize ¶
observe_modality_at_finalize(
mod_metrics: OmniModalityMetrics,
*,
output_type: str | None,
stage_id: int,
replica_id: int | None,
stage_metrics: Any,
engine_outputs: Any,
) -> None
Route audio-path observations for a finalized request.
Used by omni_base._process_single_result inside the e2e_done finalize guard so it fires once per request. Skips text path (covered by upstream vllm:*{stage="thinker", ...}) and any case where required inputs are missing — caller should not need to pre-validate.
audio_ttfp is intentionally NOT observed here; it's emitted by the streaming hook at first-packet time, not at finalize.