vllm.v1.core.encoder_cache_manager
EncoderCacheManager
¶
Source code in vllm/v1/core/encoder_cache_manager.py
allocate
¶
Source code in vllm/v1/core/encoder_cache_manager.py
can_allocate
¶
free_encoder_input
¶
Free a single encoder input id for the request.
Source code in vllm/v1/core/encoder_cache_manager.py
get_cached_input_ids
¶
get_freed_ids
¶
_compute_encoder_budget_multimodal
¶
_compute_encoder_budget_multimodal(
model_config: ModelConfig,
scheduler_config: SchedulerConfig,
mm_registry: MultiModalRegistry,
) -> tuple[int, int]
Compute the encoder cache budget based on the model and scheduler configurations for a multimodal model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_config
|
ModelConfig
|
Model configuration. |
required |
scheduler_config
|
SchedulerConfig
|
Scheduler configuration. |
required |
mm_registry
|
MultiModalRegistry
|
Provides information about the token cost. |
required |
Returns:
| Type | Description |
|---|---|
int
|
|
int
|
|
Source code in vllm/v1/core/encoder_cache_manager.py
compute_encoder_budget
¶
compute_encoder_budget(
model_config: ModelConfig,
scheduler_config: SchedulerConfig,
mm_registry: MultiModalRegistry,
) -> tuple[int, int]
Compute the encoder cache budget based on the model and scheduler configurations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_config
|
ModelConfig
|
Model configuration. |
required |
scheduler_config
|
SchedulerConfig
|
Scheduler configuration. |
required |
mm_registry
|
MultiModalRegistry
|
Provides information about the token cost. |
required |
Returns:
| Type | Description |
|---|---|
int
|
|
int
|
|