BEGIN:VCALENDAR
VERSION:2.0
PRODID:IEEE vTools.Events//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:Canada/Pacific
BEGIN:DAYLIGHT
DTSTART:20210314T030000
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:PDT
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20211107T010000
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:PST
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210602T064852Z
UID:AD1B6F98-CCE0-4726-8F8A-C7EA9A514732
DTSTART;TZID=Canada/Pacific:20210616T160000
DTEND;TZID=Canada/Pacific:20210616T180000
DESCRIPTION:Recent advances in machine learning (ML) systems have made it i
 ncredibly easier to train ML models given a training set. However\, our un
 derstanding of the behavior of the model training process has not been imp
 roving at the same pace. Consequently\, a number of key questions remain: 
 How can we systematically assign importance or value to training data with
  respect to the utility of the trained models\, may it be accuracy\, fairn
 ess\, or robustness? How does noise in the training data\, either injected
  by noisy data acquisition processes or adversarial parties\, have an impa
 ct on the trained models? How can we find the right data that can be clean
 ed and labeled to improve the utility of the trained models? Just when we 
 start to understand these important questions for ML models in isolation r
 ecently\, we now have to face the reality that most real-world ML applicat
 ions are way more complex than a single ML model.\n\nIn this talk\, Profes
 sor Ce Zhang will revisit these questions for an end-to-end ML pipeline\, 
 which consists of a noise model for data and a feature extraction pipeline
 \, followed by the training of an ML model. In the first part of this talk
 \, I will introduce some recent theoretical results in an abstract way: Ho
 w to calculate the Shapley value of a training example for ML models train
 ed over feature extractors\, modeled as a polynomial in the provenance sem
 iring? How to compute the entropy and expectation of ML models trained ove
 r data uncertainty\, modeled as a Codd Table? As we will see\, even these 
 problems are #P-hard for general ML models\, though\, surprisingly\, we ca
 n obtain PTIME algorithms for a simpler proxy model (namely a K-nearest ne
 ighbor classifier)\, for a large family of polynomials\, input noise distr
 ibutions\, and utilities.\n\nProfessor Ce Zhang will then put these theore
 tical results into practice. Given a set of heuristics and a proxy model t
 o approximate a real- world end-to-end ML pipeline into these abstract pro
 blems\, I will provide a principled framework for three applications: (1) 
 certifiable defence of backdoor attacks\, (2) targeted data cleaning for M
 L\, and (3) data valuation and debugging for end-to-end ML pipelines. Prof
 essor Ce Zhang will describe both our positive empirical results but also 
 those cases that our current approach failed at.\n\nThe video recordings a
 nd the slides of the previous seminars in this series are now available at
  the webinar series webpage &lt;http://data.cs.sfu.ca/tdsa.html&gt;.\n\nThe semi
 nar is open to public for free. Zoom Registration is required for access t
 o the meeting link. This way we can ensure quality discussions of particip
 ants from industry and education\n\nVirtual: https://events.vtools.ieee.or
 g/m/273758
LOCATION:Virtual: https://events.vtools.ieee.org/m/273758
ORGANIZER:Bob_Gill@bcit.ca
SEQUENCE:2
SUMMARY:Seminar Series on Trustworthy Data Science and AI: Toward Understan
 ding End-to-End Learning in the Context of Data by Professor Ce Zhang
URL;VALUE=URI:https://events.vtools.ieee.org/m/273758
X-ALT-DESC:Description: &lt;br /&gt;&lt;p&gt;Recent advances in machine learning (ML) s
 ystems have made it incredibly easier to train ML models given a training 
 set. However\, our understanding of the behavior of the model training pro
 cess has not been improving at the same pace. Consequently\, a number of k
 ey questions remain: How can we systematically assign importance or value 
 to training data with respect to the utility of the trained models\, may i
 t be accuracy\, fairness\, or robustness? How does noise in the training d
 ata\, either injected by noisy data acquisition processes or adversarial p
 arties\, have an impact on the trained models? How can we find the right d
 ata that can be cleaned and labeled to improve the utility of the trained 
 models? Just when we start to understand these important questions for ML 
 models in isolation recently\, we now have to face the reality that most r
 eal-world ML applications are way more complex than a single ML model.&lt;/p&gt;
 \n&lt;p&gt;In this talk\, Professor Ce Zhang will revisit these questions for an
  end-to-end ML pipeline\, which consists of a noise model for data and a f
 eature extraction pipeline\, followed by the training of an ML model. In t
 he first part of this talk\, I will introduce some recent theoretical resu
 lts in an abstract way: How to calculate the Shapley value of a training e
 xample for ML models trained over feature extractors\, modeled as a polyno
 mial in the provenance semiring? How to compute the entropy and expectatio
 n of ML models trained over data uncertainty\, modeled as a Codd Table? As
  we will see\, even these problems are #P-hard for general ML models\, tho
 ugh\, surprisingly\, we can obtain PTIME algorithms for a simpler proxy mo
 del (namely a K-nearest neighbor classifier)\, for a large family of polyn
 omials\, input noise distributions\, and utilities.&lt;/p&gt;\n&lt;p&gt;Professor Ce Z
 hang will then put these theoretical results into practice. Given a set of
  heuristics and a proxy model to approximate a real- world end-to-end ML p
 ipeline into these abstract problems\, I will provide a principled framewo
 rk for three applications: (1) certifiable defence of backdoor attacks\, (
 2) targeted data cleaning for ML\, and (3) data valuation and debugging fo
 r end-to-end ML pipelines. Professor Ce Zhang will describe both our posit
 ive empirical results but also those cases that our current approach faile
 d at.&lt;/p&gt;\n&lt;p&gt;The video recordings and the slides of the previous seminars
  in this series are now available at the webinar series webpage &amp;lt\;&lt;a hr
 ef=&quot;http://data.cs.sfu.ca/tdsa.html&quot;&gt;http://data.cs.sfu.ca/tdsa.html&lt;/a&gt;&amp;g
 t\;.&lt;/p&gt;\n&lt;div class=&quot;dc-content&quot;&gt;\n&lt;div class=&quot;dc-modules&quot;&gt;\n&lt;div class=&quot;
 dc-modules__item&quot;&gt;\n&lt;div class=&quot;eds-l-mar-bot-12 eds-l-lg-mar-bot-14&quot;&gt;\n&lt;d
 iv class=&quot;dc-modules__item--text&quot;&gt;\n&lt;p&gt;&lt;strong&gt;The seminar is open to publ
 ic for free.&amp;nbsp\;Zoom Registration is required for access to the meeting
  link.&lt;/strong&gt;&amp;nbsp\;This way we can ensure quality discussions of partic
 ipants from industry and education&lt;/p&gt;\n&lt;/div&gt;\n&lt;/div&gt;\n&lt;/div&gt;\n&lt;/div&gt;\n&lt;/
 div&gt;
END:VEVENT
END:VCALENDAR

