Skip to content

vllm_omni.diffusion.hooks.sequence_parallel

Sequence Parallelism hooks for non-intrusive SP support.

This module implements the hook-based mechanism for applying sequence parallelism to models without modifying their forward() methods.

Usage
  1. Define _sp_plan on your model class (corresponds to diffusers' _cp_plan)
  2. Call apply_sequence_parallel(model, config, plan) to enable SP
  3. Call remove_sequence_parallel(model, plan) to disable SP

The hooks automatically shard inputs before forward and gather outputs after, based on the plan specification.

logger module-attribute

logger = init_logger(__name__)

ModuleForwardMetadata dataclass

Metadata for mapping forward() parameter names to args/kwargs positions.

This caches the inspection of a module's forward signature to efficiently locate parameters by name in subsequent calls.

cached_parameter_indices class-attribute instance-attribute

cached_parameter_indices: dict[str, int] | None = None

SequenceParallelGatherHook

Bases: ModelHook

Hook for gathering outputs after a module's forward pass.

This hook is registered to modules that need their outputs gathered from all sequence parallel ranks. It intercepts the output and gathers it according to the plan specification.

Note: This corresponds to ContextParallelGatherHook in diffusers.

config instance-attribute

config = config

metadata instance-attribute

metadata = metadata

initialize_hook

initialize_hook(module: Module) -> Module

post_forward

post_forward(module: Module, output: Any) -> Any

Gather outputs after forward and remove padding if applied.

SequenceParallelSplitHook

Bases: ModelHook

Hook for splitting inputs before a module's forward pass.

This hook is registered to modules that need their inputs sharded across sequence parallel ranks. It intercepts the forward call, shards specified inputs according to the plan, and passes the sharded inputs to the original forward.

For split_output=True inputs, it shards the output instead.

Supports both SequenceParallelInput (full split) and SequenceParallelPartialInput (partial split for text/image separation).

Note: This corresponds to ContextParallelSplitHook in diffusers.

config instance-attribute

config = config

metadata instance-attribute

metadata = metadata

module_forward_metadata instance-attribute

module_forward_metadata: ModuleForwardMetadata | None = None

initialize_hook

initialize_hook(module: Module) -> Module

post_forward

post_forward(module: Module, output: Any) -> Any

Shard outputs for split_output=True entries.

pre_forward

pre_forward(
    module: Module, *args: Any, **kwargs: Any
) -> tuple[tuple, dict]

Shard inputs before forward.

apply_sequence_parallel

apply_sequence_parallel(
    module: Module,
    config: SequenceParallelConfig,
    plan: SequenceParallelModelPlan,
) -> None

Apply sequence parallel hooks to a model according to the plan.

This function registers hooks on the specified submodules to automatically shard inputs and gather outputs for sequence parallelism.

Note: This corresponds to apply_context_parallel in diffusers.

The complete SP flow is: 1. Input sharding (SequenceParallelSplitHook): Split sequence across SP ranks 2. Attention parallelism (handled by vLLM-Omni's Attention layer): - Ulysses: All-to-All over Q/K/V heads - Ring: K/V circulation in ring topology - Hybrid: Both (Ulysses handles head redistribution, Ring handles K/V) 3. Output gathering (SequenceParallelGatherHook): Gather sequence from SP ranks

Parameters:

Name Type Description Default
module Module

The model to apply SP to.

required
config SequenceParallelConfig

The sequence parallel configuration.

required
plan SequenceParallelModelPlan

Dictionary mapping module names to input/output specifications.

required
Example

config = SequenceParallelConfig(ulysses_degree=2) plan = { "": {"hidden_states": SequenceParallelInput(split_dim=1, expected_dims=3)}, "proj_out": SequenceParallelOutput(gather_dim=1, expected_dims=3), } apply_sequence_parallel(model, config, plan)

Note

vLLM-Omni's Attention layer automatically handles the internal parallelism (Ulysses All-to-All or Ring attention) based on the forward_context configuration. This function only handles input/output sharding for the model as a whole.

disable_sequence_parallel_for_model

disable_sequence_parallel_for_model(model: Module) -> None

Disable sequence parallelism for a model.

Note: This corresponds to disable_context_parallel_for_model in diffusers.

Parameters:

Name Type Description Default
model Module

The model to disable SP for.

required

enable_sequence_parallel_for_model

enable_sequence_parallel_for_model(
    model: Module,
    config: SequenceParallelConfig | None = None,
) -> None

Enable sequence parallelism for a model using its _sp_plan.

This is a convenience function that reads the model's _sp_plan attribute and applies sequence parallelism automatically.

Note: This corresponds to enable_context_parallel_for_model in diffusers, but uses vLLM-Omni's _sp_plan instead of diffusers' _cp_plan.

The function performs two main tasks: 1. Applies _sp_plan hooks to shard inputs and gather outputs 2. Ensures Attention layers are configured for the correct parallel mode (handled automatically by vLLM-Omni's forward_context mechanism)

Parameters:

Name Type Description Default
model Module

The model to enable SP for. Must have a _sp_plan attribute.

required
config SequenceParallelConfig | None

Optional config. If None, uses default based on current parallel state.

None

Raises:

Type Description
ValueError

If model has no _sp_plan defined.

Note

vLLM-Omni supports Ulysses + Ring hybrid parallelism: - ulysses_degree > 1: Uses All-to-All communication over Q/K/V heads - ring_degree > 1: Uses Ring attention with K/V passing - Both > 1: Hybrid mode (Ulysses handles head redistribution, Ring handles K/V circulation)

remove_sequence_parallel

remove_sequence_parallel(
    module: Module, plan: SequenceParallelModelPlan
) -> None

Remove sequence parallel hooks from a model.

Note: This corresponds to remove_context_parallel in diffusers.

Parameters:

Name Type Description Default
module Module

The model to remove SP from.

required
plan SequenceParallelModelPlan

The same plan used when applying SP.

required