IEEE CIR: Understanding Neural Collapse in Deep Learning
In the past decade, the revival of deep neural networks has led to dramatic success in numerous applications ranging from computer vision, to natural language processing, to scientific discovery and beyond. Nevertheless, the practice of deep networks has been shrouded with mystery as our theoretical understanding for the success of deep learning remains elusive.
In this talk, we will focus on the representations learned by deep neural networks. For example, Neural collapse is an intriguing empirical phenomenon that persists across different neural network architectures and a variety of standard datasets. This phenomenon implies that (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero. We will provide a geometric analysis for understanding why this happens on a simplified unconstrained feature model. We will also exploit these findings to improve training efficiency: we can set the feature dimension equal to the number of classes and fix the last-layer classifier to be a Simplex ETF for network training, reducing memory cost by over 20% on ResNet18 without sacrificing the generalization performance.
Date and Time
Location
Hosts
Registration
- Date: 09 Jun 2022
- Time: 05:30 PM to 07:00 PM
- All times are (GMT-07:00) US/Mountain
- Add Event to Calendar
- 2155 East Wesley Avenue
- Denver, Colorado
- United States 80208
- Contact Event Host
-
https://r5.ieee.org/denver-cs/upcoming-presenters/
- Co-sponsored by Christopher Reardon, Eric Ericson
Speakers
Dr. Zhihui Zhu of University of Denver
Understanding Neural Collapse in Deep Learning
In the past decade, the revival of deep neural networks has led to dramatic success in numerous applications ranging from computer vision, to natural language processing, to scientific discovery and beyond. Nevertheless, the practice of deep networks has been shrouded with mystery as our theoretical understanding for the success of deep learning remains elusive.
In this talk, we will focus on the representations learned by deep neural networks. For example, Neural collapse is an intriguing empirical phenomenon that persists across different neural network architectures and a variety of standard datasets. This phenomenon implies that (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero. We will provide a geometric analysis for understanding why this happens on a simplified unconstrained feature model. We will also exploit these findings to improve training efficiency: we can set the feature dimension equal to the number of classes and fix the last-layer classifier to be a Simplex ETF for network training, reducing memory cost by over 20% on ResNet18 without sacrificing the generalization performance.
Biography:
Zhihui Zhu is currently an Assistant Professor with the Department of Electrical and Computer Engineering, University of Denver, CO, USA. He received the Ph.D. degree in electrical engineering from the Colorado School of Mines, Golden, CO, USA, in 2017. He was a Post-Doctoral Fellow with the Mathematical Institute for Data Science, Johns Hopkins University, Baltimore, MD, USA, from 2018 to 2019. His research interests include the exploitation of inherent low-dimensional structures within data and signals, and the design, analysis, and implementation of optimization algorithms for machine learning and signal processing.