Image generation with end-to-end training and benefits of a good VAE

#architecture #deep-learning #image-generation #artificial-intelligence #communication #machine-learning #networking #processing
Share

Latent diffusion models underly modern image generation, which requires a variational auto-encoder (VAE) for image encoding and decoding, and a diffusion transformer for generation. While end-to-end training has been the spirit of deep learning, it is surprising that latent diffusion models are not trained end-to-end, causing representation bottlenecks. In this talk, I will introduce our work that jointly trains the VAE and diffusion transformer and show how it accelerates training and yields high quality images. Further, I will discuss use cases where the resulting end-to-end trained VAEs bring significant benefits. This includes higher-quality text-to-image generation and automatic agentic search of diffusion transformer architectures. I will conclude with new perspectives.

 

Liang Zheng

Australian National University

Dr. Liang Zheng is an Associate Professor at the Australian National University and a Research Scientist at Canva. He is interested in representation learning for perception and generation. He contributed many useful datasets and methods to the object re-identification field that were later used in wider domains. He is currently working on image generation in both aspects of pre-training and post-training. He is a Program Chair for ACM MM’24, MM’28, andAVSS'24, and a General Chair for AVSS’27 and DICTA 2027. He is a regular area chair for important conferences and an Associate Editor for TPAMI. He has bachelor degrees in Biology, Economics and a PhD degree in Computer Science from Tsinghua University.



  Date and Time

  Location

  Hosts

  Registration



  • Add_To_Calendar_icon Add Event to Calendar

Loading virtual attendance info...

  • Contact Event Host