BEGIN:VCALENDAR
VERSION:2.0
PRODID:IEEE vTools.Events//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:Canada/Eastern
BEGIN:DAYLIGHT
DTSTART:20220313T030000
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:EDT
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20211107T010000
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:EST
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20220328T183531Z
UID:32F7126A-0CE6-497D-99C7-F50847648680
DTSTART;TZID=Canada/Eastern:20211123T170000
DTEND;TZID=Canada/Eastern:20211123T183000
DESCRIPTION:Prerequisites:\nYou do not need to have attended the earlier ta
 lks. If you know zero math and zero machine learning\, then this talk is f
 or you. Jeff will do his best to explain fairly hard mathematics to you. I
 f you know a bunch of math and/or a bunch machine learning\, then these ta
 lks are for you. Jeff tries to spin the ideas in new ways.\nLonger Abstrac
 t:\nAt the risk of being non-standard\, Jeff will tell you the way he thin
 ks about this topic. Both &quot;Game Trees&quot; and &quot;Markoff Chains&quot; represent the 
 graph of states through which your agent will traverse a path while comple
 ting the task. Suppose we could learn for each such state a value measurin
 g &quot;how good&quot; this state is for the agent. Then competing the task in an op
 timal way would be easy. If our current state is one within which our agen
 t gets to choose the next action\, then she will choose the action that ma
 ximizes the value of our next state. On the other hand\, if our adversary 
 gets to choose\, he will choose the action that minimizes this value. Fina
 lly\, if our current state is one within which the universe flips a coin\,
  then each edge leaving this state will be labeled with the probability of
  taking it. Knowing that that is how the game is played\, we can compute h
 ow good each state is. The states in which the task is complete is worth w
 hatever reward the agent receives in the said state. These values somehow 
 trickle backwards until we learn the value of the start state. The computa
 tional challenge is that there are way more states then we can ever look a
 t.\n\nSpeaker(s): Prof. Jeff Edmonds\, \n\nVirtual: https://events.vtools.
 ieee.org/m/287737
LOCATION:Virtual: https://events.vtools.ieee.org/m/287737
ORGANIZER:ayda.naserialiabadi@ryerson.ca
SEQUENCE:1
SUMMARY:Reinforcement Learning Game Tree / Markoff Chains
URL;VALUE=URI:https://events.vtools.ieee.org/m/287737
X-ALT-DESC:Description: &lt;br /&gt;&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;br /&gt;You d
 o not need to have attended the earlier talks. If you know zero math and z
 ero machine learning\, then this talk is for you. Jeff will do his best to
  explain fairly hard mathematics to you. If you know a bunch of math and/o
 r a bunch machine learning\, then these talks are for you. Jeff tries to s
 pin the ideas in new ways.&lt;br /&gt;&lt;strong&gt;Longer Abstract:&lt;/strong&gt;&lt;br /&gt;At 
 the risk of being non-standard\, Jeff will tell you the way he thinks abou
 t this topic. Both &quot;Game Trees&quot; and &quot;Markoff Chains&quot; represent the graph o
 f states through which your agent will traverse a path while completing th
 e task. Suppose we could learn for each such state a value measuring &quot;how 
 good&quot; this state is for the agent. Then competing the task in an optimal w
 ay would be easy. If our current state is one within which our agent gets 
 to choose the next action\, then she will choose the action that maximizes
  the value of our next state. On the other hand\, if our adversary gets to
  choose\, he will choose the action that minimizes this value. Finally\, i
 f our current state is one within which the universe flips a coin\, then e
 ach edge leaving this state will be labeled with the probability of taking
 &amp;nbsp\;it. Knowing that that is how the game is played\, we can compute ho
 w good each state is. The states in which the task is complete is worth wh
 atever reward the agent receives in the said state. These values somehow t
 rickle backwards until we learn the value of the start state. The computat
 ional challenge is that there are way more states then we can ever look at
 .&lt;/p&gt;
END:VEVENT
END:VCALENDAR

