TY - GEN
T1 - Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems
AU - Cai, Tianchi
AU - Bao, Shenliao
AU - Jiang, Jiyan
AU - Zhou, Shiji
AU - Zhang, Wenpeng
AU - Gu, Lihong
AU - Gu, Jinjie
AU - Zhang, Guannan
N1 - Publisher Copyright:
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2023/7/18
Y1 - 2023/7/18
N2 - Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can effectively utilize various supervised models. We demonstrate the superiority of the proposed frameworks over different RL-based recommendation baselines with extensive experiments on a recommendation simulator as well as an industrial-level recommender system.
AB - Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can effectively utilize various supervised models. We demonstrate the superiority of the proposed frameworks over different RL-based recommendation baselines with extensive experiments on a recommendation simulator as well as an industrial-level recommender system.
KW - Recommender System
KW - Reinforcement Learning
UR - https://www.scopus.com/pages/publications/85168660723
U2 - 10.1145/3539618.3592022
DO - 10.1145/3539618.3592022
M3 - 会议稿件
AN - SCOPUS:85168660723
T3 - SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 2179
EP - 2183
BT - SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023
Y2 - 23 July 2023 through 27 July 2023
ER -