vllm_omni.diffusion.models.hunyuan_image3.autoencoder ¶
AutoencoderKLConv3D ¶
Bases: ModelMixin, ConfigMixin
Autoencoder model with KL-regularized latent space based on 3D convolutions.
decoder instance-attribute ¶
decoder = Decoder(
z_channels=latent_channels,
out_channels=out_channels,
block_out_channels=list(reversed(block_out_channels)),
num_res_blocks=layers_per_block,
ffactor_spatial=ffactor_spatial,
ffactor_temporal=ffactor_temporal,
upsample_match_channel=upsample_match_channel,
)
encoder instance-attribute ¶
encoder = Encoder(
in_channels=in_channels,
z_channels=latent_channels,
block_out_channels=block_out_channels,
num_res_blocks=layers_per_block,
ffactor_spatial=ffactor_spatial,
ffactor_temporal=ffactor_temporal,
downsample_match_channel=downsample_match_channel,
)
decode ¶
decode(z: Tensor, return_dict: bool = True, generator=None)
Decodes the input by passing through the decoder network. Support slicing and tiling for memory efficiency.
Conv3d ¶
Bases: Conv3d
Perform Conv3d on patches with numerical differences from nn.Conv3d within 1e-5. Only symmetric padding is supported.
Decoder ¶
Bases: Module
The decoder network of AutoencoderKLConv3D.
DecoderOutput dataclass ¶
Bases: BaseOutput
posterior class-attribute instance-attribute ¶
posterior: DiagonalGaussianDistribution | None = None
DiagonalGaussianDistribution ¶
DownsampleDCAE ¶
Encoder ¶
Bases: Module
The encoder network of AutoencoderKLConv3D.
ResnetBlock ¶
Bases: Module
conv1 instance-attribute ¶
conv1 = Conv3d(
in_channels,
out_channels,
kernel_size=3,
stride=1,
padding=1,
)
conv2 instance-attribute ¶
conv2 = Conv3d(
out_channels,
out_channels,
kernel_size=3,
stride=1,
padding=1,
)
nin_shortcut instance-attribute ¶
nin_shortcut = Conv3d(
in_channels,
out_channels,
kernel_size=1,
stride=1,
padding=0,
)