vllm_omni.metrics.definitions ¶
Single source of truth for vLLM-Omni Prometheus + bench CLI metric naming.
Consumed by: - vllm_omni.metrics.prometheus (server-side /metrics pipeline families) - vllm_omni.metrics.modality (audio families) - vllm_omni.metrics.transfer (cross-stage transfer families) - vllm_omni.benchmarks.metrics.metrics (bench CLI MultiModalsBenchmarkMetrics)
Naming conventions for the vllm_omni:* families exposed here: time-bearing metrics use the _s suffix (values in seconds), counters use _total (auto-suffixed by the prometheus client), sizes use _bytes.
AUDIO_CONTINUITY_LABELS module-attribute ¶
AUDIO_CONTINUITY_OK_METRIC module-attribute ¶
AUDIO_CONTINUITY_OK_METRIC = (
METRIC_PREFIX + AUDIO_CONTINUITY_OK
)
AUDIO_CONTINUITY_OK_RATE module-attribute ¶
AUDIO_CONTINUITY_OK_RATE = f'{AUDIO_CONTINUITY_OK}_rate'
AUDIO_SKIPPED_LABELS module-attribute ¶
AUDIO_SKIPPED_REQUESTS_METRIC module-attribute ¶
AUDIO_SKIPPED_REQUESTS_METRIC = (
METRIC_PREFIX + AUDIO_SKIPPED_REQUESTS
)
BYTES_BUCKETS module-attribute ¶
BYTES_BUCKETS = (
1024,
4096,
16384,
65536,
262144,
1048576,
4194304,
16777216,
67108864,
268435456,
)
E2E_REQUEST_LATENCY_S module-attribute ¶
E2E_REQUEST_LATENCY_S = (
METRIC_PREFIX + "e2e_request_latency_s"
)
IMAGE_GENERATION_TIME_MS module-attribute ¶
IMAGE_GENERATION_TIME_MS = f'{IMAGE_GENERATION}_time_ms'
INTER_OUTPUT_LATENCIES_MS module-attribute ¶
MEAN_DENOISE_STEP_LATENCY_MS module-attribute ¶
MEAN_DENOISE_STEP_LATENCY_MS = (
f"mean_{DENOISE_STEP_LATENCY}_ms"
)
MEAN_IMAGE_GENERATION_MS module-attribute ¶
MEAN_IMAGE_GENERATION_MS = f'mean_{IMAGE_GENERATION}_ms'
MEDIAN_IMAGE_GENERATION_MS module-attribute ¶
MEDIAN_IMAGE_GENERATION_MS = f"median_{IMAGE_GENERATION}_ms"
MISSING_AUDIO_DURATION_COUNT module-attribute ¶
NUM_REQUESTS_RUNNING module-attribute ¶
NUM_REQUESTS_RUNNING = (
METRIC_PREFIX + "num_requests_running"
)
NUM_REQUESTS_WAITING module-attribute ¶
NUM_REQUESTS_WAITING = (
METRIC_PREFIX + "num_requests_waiting"
)
PERCENTILES_AUDIO_DURATION_S module-attribute ¶
PERCENTILES_AUDIO_DURATION_S = (
f"percentiles_{AUDIO_DURATION}_s"
)
PERCENTILES_AUDIO_TTFP_MS module-attribute ¶
PERCENTILES_AUDIO_TTFP_MS = f'percentiles_{AUDIO_TTFP}_ms'
PERCENTILES_AUDIO_UNDERRUN_S module-attribute ¶
PERCENTILES_AUDIO_UNDERRUN_S = (
f"percentiles_{AUDIO_UNDERRUN}_s"
)
PERCENTILES_IMAGE_GENERATION_MS module-attribute ¶
PERCENTILES_IMAGE_GENERATION_MS = (
f"percentiles_{IMAGE_GENERATION}_ms"
)
RTF_BUCKETS module-attribute ¶
SECONDS_BUCKETS module-attribute ¶
SECONDS_FAST_BUCKETS module-attribute ¶
SECONDS_FAST_BUCKETS = (
0.001,
0.005,
0.01,
0.025,
0.05,
0.1,
0.25,
0.5,
1.0,
2.5,
5.0,
10.0,
60.0,
)
SERVING_TIME_TO_FIRST_OUTPUTS_MS module-attribute ¶
SERVING_TIME_TO_FIRST_OUTPUT_MS module-attribute ¶
TRANSFER_IN_FLIGHT_S module-attribute ¶
TRANSFER_IN_FLIGHT_S = (
METRIC_PREFIX + "transfer_in_flight_s"
)
TRANSFER_LABELS module-attribute ¶
compute_audio_rtf ¶
RTF = stage_gen_time / audio_content_duration.
SLO red line < 1 — must generate faster than content plays back to stream. Returns 0.0 when audio_duration_s is non-positive (caller decides whether to observe; we don't want to divide by zero or emit negative samples).
compute_denoise_step_latency ¶
Mean denoise step latency = image stage generation time / step count.
The returned value uses the same time unit as stage_gen_time.
resolve_audio_sample_rate ¶
Extract audio sample_rate from a dict or config object, with fallbacks.
Tries the same key chain as serving_chat.py's audio response path so /metrics audio_duration_s = audio_frames / sample_rate stays consistent with what the OpenAI streaming endpoint reports back to clients. Also accepts config objects that expose the same values as attributes. Returns DEFAULT_AUDIO_SAMPLE_RATE when no usable value is present.