Towards Understanding the Generalization Mystery in Deep Learning

#Deep #Learning #Machine #Neural #Networks
Share

A big open question in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of comparable size? Furthermore, from among all solutions that fit the training data, how does GD find one that generalizes well (when such a well-generalizing solution exists)? In this talk we argue that the answer to both questions lies in the interaction of the gradients of different examples during training, and present a new theory based on this idea. The theory also explains a number of other phenomena in deep learning, such as why some examples are reliably learned earlier than others, why early stopping works, and why it is possible to learn from noisy labels. Moreover, since the theory provides a causal explanation of how GD finds a well-generalizing solution when one exists, it motivates a class of simple modifications to GD that attenuate memorization and improve generalization.



  Date and Time

  Location

  Hosts

  Registration



  • Date: 16 Nov 2022
  • Time: 02:00 PM to 03:00 PM
  • All times are (UTC+01:00) Bern
  • Add_To_Calendar_icon Add Event to Calendar
If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.
  • EPFL
  • Lausanne, Switzerland
  • Switzerland
  • Building: INF
  • Room Number: 328
  • Click here for Map

  • Contact Event Host
  • Starts 24 October 2022 07:52 PM
  • Ends 16 November 2022 02:12 PM
  • All times are (UTC+01:00) Bern
  • No Admission Charge


  Speakers

Dr. Chatterjee

Biography:

Sat is an Engineering Leader and Machine Learning Researcher who was at Google AI till recently. His current research focuses on fundamental questions in deep learning (such as understanding why neural networks generalize at all) as well as various applications of ML (ranging from hardware design and verification to quantitative finance). Before Google, he was a Senior Vice President at Two Sigma, a leading quantitative investment manager, where he founded one of the first successful deep learning-based alpha research groups on Wall Street and led a team that built one of the earliest end-to-end FPGA-based trading systems for general purpose ultra-low latency trading. Prior to that, he was a Research Scientist at Intel where he worked on microarchitectural performance analysis and formal verification for on-chip networks. He did his undergraduate studies at IIT Bombay, has a PhD in Computer Science from UC Berkeley, and has published in the top machine learning, design automation, and formal verification conferences.

Address:Switzerland