BEGIN:VCALENDAR
VERSION:2.0
PRODID:IEEE vTools.Events//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:US/Eastern
BEGIN:DAYLIGHT
DTSTART:20220313T030000
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:EDT
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20221106T010000
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:EST
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20220328T153549Z
UID:9B09C020-9AA6-4352-8F21-1B4B8DAFCFE8
DTSTART;TZID=US/Eastern:20220323T120000
DTEND;TZID=US/Eastern:20220323T130000
DESCRIPTION:Policy optimization\, which learns the policy of interest by ma
 ximizing the value function via large-scale optimization techniques\, lies
  at the heart of modern reinforcement learning (RL). In addition to value 
 maximization\, other practical considerations arise commonly as well\, inc
 luding the need of encouraging exploration\, and that of ensuring certain 
 structural properties of the learned policy due to safety\, resource and o
 perational constraints. These considerations can often be accounted for by
  resorting to regularized RL\, which augments the target value function wi
 th a structure-promoting regularization term\, such as Shannon entropy\, T
 sallis entropy\, and log-barrier functions. Focusing on an infinite-horizo
 n discounted Markov decision process\, this talk first shows that entropy-
 regularized natural policy gradient methods converge globally at a linear 
 convergence that is near independent of the dimension of the state-action 
 space\, whereas the vanilla softmax policy gradient method may take an exp
 onential time to converge. Next\, a generalized policy mirror descent algo
 rithm is proposed to accommodate a general class of convex regularizers be
 yond Shannon entropy\, even when the regularizer lacks strong convexity an
 d smoothness. Time permitting\, we will discuss how these ideas can be lev
 eraged to solve zero-sum Markov games. Our results accommodate a wide rang
 e of learning rates\, and shed light upon the role of regularization in en
 abling fast convergence in RL.\n\nCo-sponsored by: North Jersey Section\n\
 nSpeaker(s): Dr. Yuejie Chi\, \n\nAgenda: \nPolicy optimization\, which le
 arns the policy of interest by maximizing the value function via large-sca
 le optimization techniques\, lies at the heart of modern reinforcement lea
 rning (RL). In addition to value maximization\, other practical considerat
 ions arise commonly as well\, including the need of encouraging exploratio
 n\, and that of ensuring certain structural properties of the learned poli
 cy due to safety\, resource and operational constraints. These considerati
 ons can often be accounted for by resorting to regularized RL\, which augm
 ents the target value function with a structure-promoting regularization t
 erm\, such as Shannon entropy\, Tsallis entropy\, and log-barrier function
 s. Focusing on an infinite-horizon discounted Markov decision process\, th
 is talk first shows that entropy-regularized natural policy gradient metho
 ds converge globally at a linear convergence that is near independent of t
 he dimension of the state-action space\, whereas the vanilla softmax polic
 y gradient method may take an exponential time to converge. Next\, a gener
 alized policy mirror descent algorithm is proposed to accommodate a genera
 l class of convex regularizers beyond Shannon entropy\, even when the regu
 larizer lacks strong convexity and smoothness. Time permitting\, we will d
 iscuss how these ideas can be leveraged to solve zero-sum Markov games. Ou
 r results accommodate a wide range of learning rates\, and shed light upon
  the role of regularization in enabling fast convergence in RL.\n\nRoom: M
 105\, Bldg: 	Muscarelle Center\, M105\, \, 1000 River Road \, Teaneck \, N
 ew Jersey\, United States\, 07666\, Virtual: https://events.vtools.ieee.or
 g/m/304875
LOCATION:Room: M105\, Bldg: 	Muscarelle Center\, M105\, \, 1000 River Road 
 \, Teaneck \, New Jersey\, United States\, 07666\, Virtual: https://events
 .vtools.ieee.org/m/304875
ORGANIZER:zhao@fdu.edu
SEQUENCE:3
SUMMARY:Policy Optimization in Reinforcement Learning: A Tale of Preconditi
 oning and Regularization 
URL;VALUE=URI:https://events.vtools.ieee.org/m/304875
X-ALT-DESC:Description: &lt;br /&gt;&lt;p&gt;Policy optimization\, which learns the pol
 icy of interest by maximizing the value function via large-scale optimizat
 ion techniques\, lies at the heart of modern reinforcement learning (RL). 
 In addition to value maximization\, other practical considerations arise c
 ommonly as well\, including the need of encouraging exploration\, and that
  of ensuring certain structural properties of the learned policy due to sa
 fety\, resource and operational constraints. These considerations can ofte
 n be accounted for by resorting to regularized RL\, which augments the tar
 get value function with a structure-promoting regularization term\, such a
 s Shannon entropy\, Tsallis entropy\, and log-barrier functions. Focusing 
 on an infinite-horizon discounted Markov decision process\, this talk firs
 t shows that entropy-regularized natural policy gradient methods converge 
 globally at a linear convergence that is near independent of the dimension
  of the state-action space\, whereas the vanilla softmax policy gradient m
 ethod may take an exponential time to converge. Next\, a generalized polic
 y mirror descent algorithm is proposed to accommodate a general class of c
 onvex regularizers beyond Shannon entropy\, even when the regularizer lack
 s strong convexity and smoothness. Time permitting\, we will discuss how t
 hese ideas can be leveraged to solve zero-sum Markov games. Our results ac
 commodate a wide range of learning rates\, and shed light upon the role of
  regularization in enabling fast convergence in RL.&lt;/p&gt;\n&lt;p&gt;&amp;nbsp\;&lt;/p&gt;&lt;br
  /&gt;&lt;br /&gt;Agenda: &lt;br /&gt;&lt;p&gt;Policy optimization\, which learns the policy of
  interest by maximizing the value function via large-scale optimization te
 chniques\, lies at the heart of modern reinforcement learning (RL). In add
 ition to value maximization\, other practical considerations arise commonl
 y as well\, including the need of encouraging exploration\, and that of en
 suring certain structural properties of the learned policy due to safety\,
  resource and operational constraints. These considerations can often be a
 ccounted for by resorting to regularized RL\, which augments the target va
 lue function with a structure-promoting regularization term\, such as Shan
 non entropy\, Tsallis entropy\, and log-barrier functions. Focusing on an 
 infinite-horizon discounted Markov decision process\, this talk first show
 s that entropy-regularized natural policy gradient methods converge global
 ly at a linear convergence that is near independent of the dimension of th
 e state-action space\, whereas the vanilla softmax policy gradient method 
 may take an exponential time to converge. Next\, a generalized policy mirr
 or descent algorithm is proposed to accommodate a general class of convex 
 regularizers beyond Shannon entropy\, even when the regularizer lacks stro
 ng convexity and smoothness. Time permitting\, we will discuss how these i
 deas can be leveraged to solve zero-sum Markov games. Our results accommod
 ate a wide range of learning rates\, and shed light upon the role of regul
 arization in enabling fast convergence in RL.&lt;/p&gt;\n&lt;p&gt;&amp;nbsp\;&lt;/p&gt;
END:VEVENT
END:VCALENDAR

