vllm_omni.diffusion.models.dreamzero.image_encoder ¶
DreamZero image encoder.
Only the visual tower used by DreamZero I2V inference is ported here. The checkpoint keys under action_head.image_encoder.* load via simple prefix stripping.
DreamZeroImageEncoder ¶
Bases: Module
Image encoder wrapper.
transforms instance-attribute ¶
transforms = Compose(
[
Normalize(
mean=[0.48145466, 0.4578275, 0.40821073],
std=[0.26862954, 0.26130258, 0.27577711],
)
]
)
DreamZeroLayerNorm ¶
DreamZeroVisionAttentionBlock ¶
Bases: Module
Attention block for the vision tower.
attn instance-attribute ¶
attn = DreamZeroVisionSelfAttention(
dim, num_heads, proj_dropout=proj_dropout
)
mlp instance-attribute ¶
mlp = Sequential(
Linear(dim, hidden_dim),
GELU(),
Linear(hidden_dim, dim),
Dropout(proj_dropout),
)
DreamZeroVisionSelfAttention ¶
Bases: Module
Self-attention for the vision tower.
DreamZeroVisionTransformer ¶
Bases: Module
Vision transformer used by the image encoder.
patch_embedding instance-attribute ¶
pos_embedding instance-attribute ¶
pre_norm instance-attribute ¶
pre_norm = (
DreamZeroLayerNorm(dim, eps=norm_eps)
if pre_norm
else None
)
transformer instance-attribute ¶
transformer = Sequential(
*[
(
DreamZeroVisionAttentionBlock(
dim=dim,
mlp_ratio=mlp_ratio,
num_heads=num_heads,
post_norm=post_norm,
activation=activation,
proj_dropout=proj_dropout,
norm_eps=norm_eps,
)
)
for _ in (range(num_layers))
]
)