Skip to content

vllm_omni.diffusion.lora.layers

Modules:

Name Description
base_linear
column_parallel_linear
replicated_linear
row_parallel_linear

DiffusionBaseLinearLayerWithLoRA

Bases: BaseLinearLayerWithLoRA

Diffusion-specific base that overrides apply() to use direct torch matmul instead of punica_wrapper.

punica_wrapper is used to hold multiple LoRA slots and slices efficiently.

This matches the semantics of PunicaWrapperGPU.add_lora_linear(): - Shrink: buffer = (x @ lora_a.T) - Expand: y += buffer @ lora_b.T

All other functionality (weight management, TP slicing, forward logic) is inherited from vLLM's BaseLinearLayerWithLoRA.

apply

apply(x: Tensor, bias: Tensor | None = None) -> Tensor

override: Use simple matmul instead of punica_wrapper.add_lora_linear().

This matches the exact computation in PunicaWrapperGPU.add_lora_linear() for the single-LoRA case. For packed projections (e.g. fused QKV), we apply LoRA per-slice using output_slices.

create_lora_weights

create_lora_weights(
    max_loras: int, lora_config, model_config=None
) -> None

reset_lora

reset_lora(index: int)

set_lora

set_lora(
    index: int,
    lora_a: Tensor | list[Tensor | None],
    lora_b: Tensor | list[Tensor | None],
)

DiffusionColumnParallelLinearWithLoRA

Bases: DiffusionBaseLinearLayerWithLoRA, ColumnParallelLinearWithLoRA

Diffusion ColumnParallelLinear with LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA

DiffusionMergedColumnParallelLinearWithLoRA

Bases: DiffusionBaseLinearLayerWithLoRA, MergedColumnParallelLinearWithLoRA

Diffusion MergedColumnParallelLinear (gate_up_proj) with LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA

DiffusionMergedQKVParallelLinearWithLoRA

Bases: DiffusionBaseLinearLayerWithLoRA, MergedQKVParallelLinearWithLoRA

Diffusion MergedQKVParallelLinear (to_qkv) with 3 LoRAs. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA

DiffusionQKVParallelLinearWithLoRA

Bases: DiffusionBaseLinearLayerWithLoRA, QKVParallelLinearWithLoRA

Diffusion QKVParallelLinear with single LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA

DiffusionReplicatedLinearWithLoRA

Bases: DiffusionBaseLinearLayerWithLoRA, ReplicatedLinearWithLoRA

Diffusion ReplicatedLinear with LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA

DiffusionRowParallelLinearWithLoRA

Bases: DiffusionBaseLinearLayerWithLoRA, RowParallelLinearWithLoRA

Diffusion RowParallelLinear with LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA