llmcompressor.pipelines.sequential.helpers
Classes:
-
Subgraph–Dataclass specifying an executable subgraph of a model graph
Functions:
-
handle_sequential_oom–Catch ooms and suggest changing sequential targets
-
trace_subgraphs–Trace a model to produce subgraphs, where each sequential target belongs to exactly
SequentialTracer
Bases: HFTracer
Get a tracer specialized for the given model. The resulting tracer will not trace inside of sequential targets, nor any modules which are not call graph ancestors of sequential targets
Parameters:
-
ancestors(set[Module]) –modules which are ancestors of sequential targets
Source code in src/llmcompressor/pipelines/sequential/helpers.py
Subgraph
dataclass
Subgraph(
graph: Graph,
input_names: set[str],
consumed_names: set[str],
_code: PythonCode | None = None,
)
Dataclass specifying an executable subgraph of a model graph
Parameters:
-
graph(Graph) –subgraph of model graph
-
input_names(set[str]) –argument names of the compiled forward function
-
consumed_names(set[str]) –argument names which are not used by any subsequent subgraphs and can therefore be deleted from the intermediates cache
Methods:
-
forward–Execute the operations within the subgraph
forward
Execute the operations within the subgraph
Parameters:
-
\*args–argument inputs to subgraph forward function
-
\**kwargs–keyword inputs to subgraph forward function
Returns:
-
dict[str, Any]–
Source code in src/llmcompressor/pipelines/sequential/helpers.py
find_target_nodes
Find all nodes whose execution is equivalent to executing the target modules. Note that these nodes are guaranteed to be treated as leaf nodes by SequentialTracer
Parameters:
-
graph(GraphModule) –graph containing target nodes
-
targets(set[Module]) –modules whose nodes are being searched for
Returns:
-
set[Node]–set of all nodes which call the target modules
Source code in src/llmcompressor/pipelines/sequential/helpers.py
get_sequential_ancestors
Find modules which are call graph ancestors of the given sequential targets
Parameters:
-
model(Module) –model containing sequential targets
-
targets(set[Module]) –sequential targets to find ancestors of
Returns:
-
set[Module]–call graph ancestors of sequential targets
Source code in src/llmcompressor/pipelines/sequential/helpers.py
graph_is_well_formed
A graph is well formed if and only if
nodeA in NodeB.users <=> nodeB in Node.A.all_input_nodes
Parameters:
-
graph(Graph) –graph being checked
Returns:
-
bool–True if the graph is well formed, False otherwise
Source code in src/llmcompressor/pipelines/sequential/helpers.py
handle_sequential_oom
Catch ooms and suggest changing sequential targets
Source code in src/llmcompressor/pipelines/sequential/helpers.py
partition_graph
Convert each partition into a Subgraph. Each Subgraph returns a dictionary mapping
of output node names to their computed values. Note that the consumed_names
attribute of each Subgraph remains empty, to be later populated by
trace_consumed_names
Parameters:
-
model(Module) –model which owns the produced Subgraphs
-
partitions(list[list[Node]]) –list of partitions, where each partition is a list of nodes belonging to that partition
Returns:
-
list[Subgraph]–list of subgraphs in order of execution
Source code in src/llmcompressor/pipelines/sequential/helpers.py
populate_concrete_args
Creates concrete args which, unlike the equivalent function provided by transformers.utils.fx, creates default values for variadic arguments, which are needed by some models.
Parameters:
-
model(Module) –model being traced
-
sample_input(dict) –values used to symbolically trace the model. All arguments to the model.forward function which are not in the sample_input are considered concrete args
Returns:
-
dict–dictionary mapping concrete argument names to their default values
Source code in src/llmcompressor/pipelines/sequential/helpers.py
topological_partition
topological_partition(
graph: GraphModule,
targets: set[Module],
targets_per_subgraph: int = 1,
) -> list[list[Node]]
Partition the graph into partitions such that each target belongs to exactly one
partition and executing each partition depends only on intermediate values produced
by executing the partitions before it.
Parameters:
-
graph(GraphModule) –graph being partitioned
-
targets(set[Module]) –target modules which will be assigned to disjoint partitions
-
targets_per_subgraph(int, default:1) –number of targets to include per subgraph
Returns:
-
list[list[Node]]–list of partitions, where each partition is a list of nodes belonging to that partition
Source code in src/llmcompressor/pipelines/sequential/helpers.py
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 | |
trace_consumed_names
Populate the consumed_names attribute of each Subgraph according to when inputs
are last used in order to vacate the intermediates cache and save memory
Parameters:
-
subgraphs(list[Subgraph]) –list of subgraphs with empty
consumed_namesattributes
Source code in src/llmcompressor/pipelines/sequential/helpers.py
trace_subgraphs
trace_subgraphs(
model: PreTrainedModel,
sample_input: dict[str, Any],
sequential_targets: list[str],
ignore: list[str],
targets_per_subgraph: int = 1,
) -> list[Subgraph]
Trace a model to produce subgraphs, where each sequential target belongs to exactly one subgraph and where executing each subgraph in order is equivalent to executing the original model
Parameters:
-
model(PreTrainedModel) –model being traced
-
sample_input(dict[str, Any]) –inputs whose values will change during execution but whose len, bool, and contains values are assumed constant across batches
-
sequential_targets(list[str]) –list of patterns matching sequential targets
-
ignore(list[str]) –function and method names to skip during tracing
-
targets_per_subgraph(int, default:1) –number of targets to include per subgraph
Returns:
-
list[Subgraph]–a list of Subgraphs in order of execution
Source code in src/llmcompressor/pipelines/sequential/helpers.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | |