vllm_omni.diffusion.lora.layers ¶

Modules:

Name	Description
`base_linear`
`column_parallel_linear`
`replicated_linear`
`row_parallel_linear`

DiffusionBaseLinearLayerWithLoRA ¶

Bases: BaseLinearLayerWithLoRA

Diffusion-specific base that overrides apply() to use direct torch matmul instead of punica_wrapper.

punica_wrapper is used to hold multiple LoRA slots and slices efficiently.

This matches the semantics of PunicaWrapperGPU.add_lora_linear(): - Shrink: buffer = (x @ lora_a.T) - Expand: y += buffer @ lora_b.T

All other functionality (weight management, TP slicing, forward logic) is inherited from vLLM's BaseLinearLayerWithLoRA.

apply ¶

apply(x: Tensor, bias: Tensor | None = None) -> Tensor

override: Use simple matmul instead of punica_wrapper.add_lora_linear().

This matches the exact computation in PunicaWrapperGPU.add_lora_linear() for the single-LoRA case. For packed projections (e.g. fused QKV), we apply LoRA per-slice using output_slices.

create_lora_weights ¶

create_lora_weights(
    max_loras: int, lora_config, model_config=None
) -> None

reset_lora ¶

reset_lora(index: int)

set_lora ¶

set_lora(
    index: int,
    lora_a: Tensor | list[Tensor | None],
    lora_b: Tensor | list[Tensor | None],
)

DiffusionColumnParallelLinearWithLoRA ¶

Bases: DiffusionBaseLinearLayerWithLoRA, ColumnParallelLinearWithLoRA

Diffusion ColumnParallelLinear with LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA

DiffusionMergedColumnParallelLinearWithLoRA ¶

Bases: DiffusionBaseLinearLayerWithLoRA, MergedColumnParallelLinearWithLoRA

Diffusion MergedColumnParallelLinear (gate_up_proj) with LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA

DiffusionMergedQKVParallelLinearWithLoRA ¶

Bases: DiffusionBaseLinearLayerWithLoRA, MergedQKVParallelLinearWithLoRA

Diffusion MergedQKVParallelLinear (to_qkv) with 3 LoRAs. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA

DiffusionQKVParallelLinearWithLoRA ¶

Bases: DiffusionBaseLinearLayerWithLoRA, QKVParallelLinearWithLoRA

Diffusion QKVParallelLinear with single LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA

DiffusionReplicatedLinearWithLoRA ¶

Bases: DiffusionBaseLinearLayerWithLoRA, ReplicatedLinearWithLoRA

Diffusion ReplicatedLinear with LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA

DiffusionRowParallelLinearWithLoRA ¶

Bases: DiffusionBaseLinearLayerWithLoRA, RowParallelLinearWithLoRA

Diffusion RowParallelLinear with LoRA. Prioritize apply() in DiffusionBaseLinearLayerWithLoRA