vllm_omni.diffusion.models.sensenova_u1 ¶
Modules:
| Name | Description |
|---|---|
pipeline_sensenova_u1 | SenseNova-U1 Pipeline for vLLM-Omni. |
sensenova_u1_transformer | Qwen3 LLM with Mixture-of-Tokenizers (MoT) for SenseNova-U1. |
SenseNovaU1Pipeline ¶
Bases: Module, SupportsComponentDiscovery, DiffusionPipelineProfilerMixin
SenseNova-U1 text-to-image and image-to-image pipeline for vllm-omni.
Builds the full model graph internally: - language_model: SenseNovaU1ForCausalLM (TP-aware) - vision_model: NEOVisionModel (understanding branch) - fm_modules: ModuleDict with vision_model_mot_gen, timestep_embedder, fm_head, etc.
img2img (image editing) is triggered when multi_modal_data["image"] is present in the prompt dict. The pipeline then uses triple KV caches (condition / img_condition / uncondition) with dual CFG (cfg_scale + img_cfg_scale).
EXTRA_BODY_PARAMS class-attribute ¶
EXTRA_BODY_PARAMS: frozenset[str] = frozenset(
{
"think",
"cfg_scale",
"cfg_norm",
"timestep_shift",
"t_eps",
"img_cfg_scale",
"max_tokens",
}
)
EXTRA_OUTPUT_PARAMS class-attribute ¶
fm_modules instance-attribute ¶
fm_modules = ModuleDict(
{
"vision_model_mot_gen": vision_model_mot_gen,
"timestep_embedder": timestep_embedder,
"fm_head": fm_head,
}
)
img_context_token_id instance-attribute ¶
img_context_token_id = convert_tokens_to_ids(
IMG_CONTEXT_TOKEN
)
language_model instance-attribute ¶
language_model = SenseNovaU1ForCausalLM(
llm_cfg, prefix="language_model"
)
weights_sources instance-attribute ¶
weights_sources = [
ComponentSource(
model_or_path=local_model_path,
subfolder=None,
revision=revision,
prefix="",
fall_back_to_pt=False,
)
]
get_sensenova_u1_post_process_func ¶
get_sensenova_u1_post_process_func(
od_config: OmniDiffusionConfig,
)