vllm_omni.utils.speaker_cache ¶

Process-wide thread-safe LRU cache for speaker extraction artifacts.

Keyed by (model_type, speaker_name, created_at) so each upload generation has its own slot. Access via :func:get_speaker_cache.

logger `module-attribute` ¶

logger = init_logger(__name__)

SpeakerEmbeddingCache ¶

Thread-safe in-memory LRU cache for speaker extraction artifacts.

clear ¶

clear(speaker_name: str | None = None) -> int

Remove entries. With a name, drops matches across model types and generations.

get ¶

get(key: tuple[str, str, int]) -> dict[str, Any] | None

make_cache_key `staticmethod` ¶

make_cache_key(
    speaker_name: str, model_type: str, created_at: int = 0
) -> tuple[str, str, int]

Build a cache key. created_at=0 for built-in speakers (no upload).

Names are normalized (stripped + lowercased) so delete/clear paths that normalize to lowercase match entries put with mixed-case names.

memory_bytes ¶

memory_bytes() -> int

put ¶

put(
    key: tuple[str, str, int], artifacts: dict[str, Any]
) -> None

stats ¶

stats() -> dict[str, Any]

get_speaker_cache ¶

get_speaker_cache() -> SpeakerEmbeddingCache

Return the process-wide speaker cache singleton.

iter_custom_voice_profiles ¶

iter_custom_voice_profiles(
    custom_voice_dir: str | PathLike[str] | None,
    *,
    expected_model_type: str | None = None,
) -> list[dict[str, Any]]

load_validated_profile_tensors ¶

load_validated_profile_tensors(
    profile: dict[str, Any],
    *,
    expected_model_type: str,
    qwen3_embedding_dim: int | None = None,
) -> dict[str, Tensor] | None