vllm.forward_context
batchsize_logging_interval
module-attribute
¶
batchsize_logging_interval: float = (
VLLM_LOG_BATCHSIZE_INTERVAL
)
DPMetadata
dataclass
¶
Source code in vllm/forward_context.py
ForwardContext
dataclass
¶
Source code in vllm/forward_context.py
attn_metadata
instance-attribute
¶
attn_metadata: Union[
AttentionMetadata, dict[str, AttentionMetadata]
]
no_compile_layers
instance-attribute
¶
Type AttentionMetadata for v0, Type Dict[str, AttentionMetadata] for v1, map from layer_name of each attention layer to its attention metadata set dynamically for each forward pass
__init__
¶
__init__(
no_compile_layers: dict[str, Any],
attn_metadata: Union[
AttentionMetadata, dict[str, AttentionMetadata]
],
virtual_engine: int,
dp_metadata: Optional[DPMetadata] = None,
) -> None
set_forward_context
¶
set_forward_context(
attn_metadata: Any,
vllm_config: VllmConfig,
virtual_engine: int = 0,
num_tokens: int = 0,
)
A context manager that stores the current forward context, can be attention metadata, etc. Here we can inject common logic for every model forward pass.
Source code in vllm/forward_context.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |