Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems

  • Tianchi Cai*
  • , Shenliao Bao
  • , Jiyan Jiang
  • , Shiji Zhou
  • , Wenpeng Zhang
  • , Lihong Gu
  • , Jinjie Gu
  • , Guannan Zhang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can effectively utilize various supervised models. We demonstrate the superiority of the proposed frameworks over different RL-based recommendation baselines with extensive experiments on a recommendation simulator as well as an industrial-level recommender system.

Original languageEnglish
Title of host publicationSIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages2179-2183
Number of pages5
ISBN (Electronic)9781450394086
DOIs
StatePublished - 18 Jul 2023
Externally publishedYes
Event46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023 - Taipei, Taiwan, Province of China
Duration: 23 Jul 202327 Jul 2023

Publication series

NameSIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023
Country/TerritoryTaiwan, Province of China
CityTaipei
Period23/07/2327/07/23

Keywords

  • Recommender System
  • Reinforcement Learning

Fingerprint

Dive into the research topics of 'Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems'. Together they form a unique fingerprint.

Cite this