SPS Oregon Chapter Seminar: Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
Learning to rank is a key problem in information retrieval and machine learning and a core part of modern search engines and recommender systems. Off-policy learning to rank aims to optimize a ranker from implicit user feedback (e.g., clicks) collected by a deployed logging policy. However, existing off-policy learning to rank methods often make strong assumptions about how users generate the click data, i.e., the click model, and hence need to tailor their methods specifically under different click models. In this talk, I will introduce how to unify the ranking process under stochastic click models as a Markov Decision Process, and then discuss our work of leveraging offline reinforcement learning methods for click model-agnostic off-policy learning to rank.
Date and Time
Location
Hosts
Registration
- Date: 30 Nov 2023
- Time: 07:00 PM UTC to 07:59 PM UTC
-
Add Event to Calendar
Speakers
Huazheng Wang
Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
Learning to rank is a key problem in information retrieval and machine learning and a core part of modern search engines and recommender systems. Off-policy learning to rank aims to optimize a ranker from implicit user feedback (e.g., clicks) collected by a deployed logging policy. However, existing off-policy learning to rank methods often make strong assumptions about how users generate the click data, i.e., the click model, and hence need to tailor their methods specifically under different click models. In this talk, I will introduce how to unify the ranking process under stochastic click models as a Markov Decision Process, and then discuss our work of leveraging offline reinforcement learning methods for click model-agnostic off-policy learning to rank.
Biography:
Huazheng Wang is an assistant professor in School of Electrical Engineering and Computer Science at Oregon State University. He was a postdoctoral research associate at Princeton University. He received his Ph.D. in Computer Science from University of Virginia, and his B.E. from University of Science and Technology of China. His research interests include reinforcement learning and information retrieval. His recent focus is developing efficient and robust reinforcement learning and multi-armed bandit algorithms with applications to online recommendation and ranking systems. He co-organized tutorials at KDD2020 and SIGIR 2021 on interactive Information Retrieval and exploration. He is a recipient of SIGIR 2019 Best Paper Award.