vllm_omni.diffusion.lora.layers ¶
Modules:
| Name | Description |
|---|---|
base_linear | |
column_parallel_linear | |
replicated_linear | |
row_parallel_linear | |
DiffusionBaseLinearLayerWithLoRA ¶
Bases: BaseLinearLayerWithLoRA
Diffusion-specific base that overrides apply() to use direct torch matmul instead of punica_wrapper.
punica_wrapper is used to hold multiple LoRA slots and slices efficiently.
This matches the semantics of PunicaWrapperGPU.add_lora_linear(): - Shrink: buffer = (x @ lora_a.T) - Expand: y += buffer @ lora_b.T
All other functionality (weight management, TP slicing, forward logic) is inherited from vLLM's BaseLinearLayerWithLoRA.
apply ¶
override: Use simple matmul instead of punica_wrapper.add_lora_linear().
This matches the exact computation in PunicaWrapperGPU.add_lora_linear() for the single-LoRA case. For packed projections (e.g. fused QKV), we apply LoRA per-slice using output_slices.
DiffusionColumnParallelLinearWithLoRA ¶
Bases: DiffusionBaseLinearLayerWithLoRA, ColumnParallelLinearWithLoRA
Diffusion ColumnParallelLinear with LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA
DiffusionMergedColumnParallelLinearWithLoRA ¶
Bases: DiffusionBaseLinearLayerWithLoRA, MergedColumnParallelLinearWithLoRA
Diffusion MergedColumnParallelLinear (gate_up_proj) with LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA
DiffusionMergedQKVParallelLinearWithLoRA ¶
Bases: DiffusionBaseLinearLayerWithLoRA, MergedQKVParallelLinearWithLoRA
Diffusion MergedQKVParallelLinear (to_qkv) with 3 LoRAs. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA
DiffusionQKVParallelLinearWithLoRA ¶
Bases: DiffusionBaseLinearLayerWithLoRA, QKVParallelLinearWithLoRA
Diffusion QKVParallelLinear with single LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA
DiffusionReplicatedLinearWithLoRA ¶
Bases: DiffusionBaseLinearLayerWithLoRA, ReplicatedLinearWithLoRA
Diffusion ReplicatedLinear with LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA
DiffusionRowParallelLinearWithLoRA ¶
Bases: DiffusionBaseLinearLayerWithLoRA, RowParallelLinearWithLoRA
Diffusion RowParallelLinear with LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA