BEGIN:VCALENDAR
VERSION:2.0
PRODID:IEEE vTools.Events//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:Europe/Zurich
BEGIN:DAYLIGHT
DTSTART:20230326T030000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20221030T020000
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20221116T160757Z
UID:95AB00E5-2B2E-46A9-8E6E-F198548071D1
DTSTART;TZID=Europe/Zurich:20221116T140000
DTEND;TZID=Europe/Zurich:20221116T150000
DESCRIPTION:A big open question in deep learning is the following: Why do o
 ver-parameterized neural networks trained with gradient descent (GD) gener
 alize well on real datasets even though they are capable of fitting random
  datasets of comparable size? Furthermore\, from among all solutions that 
 fit the training data\, how does GD find one that generalizes well (when s
 uch a well-generalizing solution exists)? In this talk we argue that the a
 nswer to both questions lies in the interaction of the gradients of differ
 ent examples during training\, and present a new theory based on this idea
 . The theory also explains a number of other phenomena in deep learning\, 
 such as why some examples are reliably learned earlier than others\, why e
 arly stopping works\, and why it is possible to learn from noisy labels. M
 oreover\, since the theory provides a causal explanation of how GD finds a
  well-generalizing solution when one exists\, it motivates a class of simp
 le modifications to GD that attenuate memorization and improve generalizat
 ion.\n\nSpeaker(s): Dr. Chatterjee \, \n\nRoom: 328\, Bldg: INF\, EPFL\, L
 ausanne\, Switzerland\, Switzerland\, Virtual: https://events.vtools.ieee.
 org/m/329319
LOCATION:Room: 328\, Bldg: INF\, EPFL\, Lausanne\, Switzerland\, Switzerlan
 d\, Virtual: https://events.vtools.ieee.org/m/329319
ORGANIZER:andreas.burg@epfl.ch
SEQUENCE:3
SUMMARY:Towards Understanding the Generalization Mystery in Deep Learning
URL;VALUE=URI:https://events.vtools.ieee.org/m/329319
X-ALT-DESC:Description: &lt;br /&gt;&lt;p&gt;A big open question in deep learning is th
 e following:&amp;nbsp\;Why do over-parameterized neural networks trained with 
 gradient descent (GD) generalize well on real datasets even though they ar
 e capable of fitting random datasets of comparable size? Furthermore\, fro
 m among all solutions that fit the training data\, how does GD find one th
 at generalizes well (when such a well-generalizing solution exists)? In th
 is talk we argue that the answer to both questions lies in the interaction
  of the gradients of different examples during training\, and present a ne
 w theory&amp;nbsp\;based on this idea.&amp;nbsp\;The theory also explains a number
  of other phenomena in deep learning\, such as why some examples are relia
 bly learned earlier than others\, why early stopping works\, and why it is
  possible to learn from noisy labels. Moreover\, since the theory provides
  a causal explanation of how GD finds a well-generalizing solution when on
 e exists\, it motivates a class of simple modifications to GD that attenua
 te memorization and improve generalization.&lt;/p&gt;
END:VEVENT
END:VCALENDAR

