Towards Understanding the Generalization Mystery in Deep Learning
A big open question in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of comparable size? Furthermore, from among all solutions that fit the training data, how does GD find one that generalizes well (when such a well-generalizing solution exists)? In this talk we argue that the answer to both questions lies in the interaction of the gradients of different examples during training, and present a new theory based on this idea. The theory also explains a number of other phenomena in deep learning, such as why some examples are reliably learned earlier than others, why early stopping works, and why it is possible to learn from noisy labels. Moreover, since the theory provides a causal explanation of how GD finds a well-generalizing solution when one exists, it motivates a class of simple modifications to GD that attenuate memorization and improve generalization.
Date and Time
Location
Hosts
Registration
- Date: 16 Nov 2022
- Time: 01:00 PM UTC to 02:00 PM UTC
-
Add Event to Calendar
- EPFL
- Lausanne, Switzerland
- Switzerland
- Building: INF
- Room Number: 328
- Click here for Map
Speakers
Dr. Chatterjee
Biography:
Sat is an Engineering Leader and Machine Learning Researcher who was at Google AI till recently. His current research focuses on fundamental questions in deep learning (such as understanding why neural networks generalize at all) as well as various applications of ML (ranging from hardware design and verification to quantitative finance). Before Google, he was a Senior Vice President at Two Sigma, a leading quantitative investment manager, where he founded one of the first successful deep learning-based alpha research groups on Wall Street and led a team that built one of the earliest end-to-end FPGA-based trading systems for general purpose ultra-low latency trading. Prior to that, he was a Research Scientist at Intel where he worked on microarchitectural performance analysis and formal verification for on-chip networks. He did his undergraduate studies at IIT Bombay, has a PhD in Computer Science from UC Berkeley, and has published in the top machine learning, design automation, and formal verification conferences.
Address:Switzerland