CascadeV: An Implementation of Wurstchen Architecture for Video Generation
CascadeV introduces a cascaded latent diffusion model that generates 2K-resolution videos with improved efficiency. It employs a novel 3D attention mechanism and can be integrated with existing text-to-video models to enhance resolution and frame rate without fine-tuning.