Towards Understanding the Generalization Mystery in Deep Learning : vTools Events

IEEE.org | IEEE Xplore Digital Library | IEEE Standards | IEEE Spectrum | More Sites

Towards Understanding the Generalization Mystery in Deep Learning

#Deep #Learning #Machine #Neural #Networks

A big open question in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of comparable size? Furthermore, from among all solutions that fit the training data, how does GD find one that generalizes well (when such a well-generalizing solution exists)? In this talk we argue that the answer to both questions lies in the interaction of the gradients of different examples during training, and present a new theory based on this idea. The theory also explains a number of other phenomena in deep learning, such as why some examples are reliably learned earlier than others, why early stopping works, and why it is possible to learn from noisy labels. Moreover, since the theory provides a causal explanation of how GD finds a well-generalizing solution when one exists, it motivates a class of simple modifications to GD that attenuate memorization and improve generalization.

Date and Time

Location

Hosts

Registration

Date: 16 Nov 2022
Time: 01:00 PM UTC to 02:00 PM UTC
Add Event to Calendar
iCal
Google Calendar

If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.

EPFL
Lausanne, Switzerland
Switzerland
Building: INF
Room Number: 328
Click here for Map

Contact Event Host

Starts 24 October 2022 05:52 PM UTC
Ends 16 November 2022 01:12 PM UTC
No Admission Charge

Speakers

Dr. Chatterjee

Biography:

Sat is an Engineering Leader and Machine Learning Researcher who was at Google AI till recently. His current research focuses on fundamental questions in deep learning (such as understanding why neural networks generalize at all) as well as various applications of ML (ranging from hardware design and verification to quantitative finance). Before Google, he was a Senior Vice President at Two Sigma, a leading quantitative investment manager, where he founded one of the first successful deep learning-based alpha research groups on Wall Street and led a team that built one of the earliest end-to-end FPGA-based trading systems for general purpose ultra-low latency trading. Prior to that, he was a Research Scientist at Intel where he worked on microarchitectural performance analysis and formal verification for on-chip networks. He did his undergraduate studies at IIT Bombay, has a PhD in Computer Science from UC Berkeley, and has published in the top machine learning, design automation, and formal verification conferences.

Address:Switzerland