TY - GEN
T1 - I4R
T2 - 29th International Joint Conference on Artificial Intelligence, IJCAI 2020
AU - Luo, Xufang
AU - Meng, Qi
AU - He, Di
AU - Chen, Wei
AU - Wang, Yunhong
N1 - Publisher Copyright:
© 2020 Inst. Sci. inf., Univ. Defence in Belgrade. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Learning expressive representations is always crucial for well-performed policies in deep reinforcement learning (DRL). Different from supervised learning, in DRL, accurate targets are not always available, and some inputs with different actions only have tiny differences, which stimulates the demand for learning expressive representations. In this paper, firstly, we empirically compare the representations of DRL models with different performances. We observe that the representations of a better state extractor (SE) are more scattered than a worse one when they are visualized. Thus, we investigate the singular values of representation matrix, and find that, better SEs always correspond to smaller differences among these singular values. Next, based on such observations, we define an indicator of the representations for DRL model, which is the Number of Significant Singular Values (NSSV) of a representation matrix. Then, we propose I4R algorithm, to improve DRL algorithms by adding the corresponding regularization term to enhance the NSSV. Finally, we apply I4R to both policy gradient and value based algorithms on Atari games, and the results show the superiority of our proposed method.1,.
AB - Learning expressive representations is always crucial for well-performed policies in deep reinforcement learning (DRL). Different from supervised learning, in DRL, accurate targets are not always available, and some inputs with different actions only have tiny differences, which stimulates the demand for learning expressive representations. In this paper, firstly, we empirically compare the representations of DRL models with different performances. We observe that the representations of a better state extractor (SE) are more scattered than a worse one when they are visualized. Thus, we investigate the singular values of representation matrix, and find that, better SEs always correspond to smaller differences among these singular values. Next, based on such observations, we define an indicator of the representations for DRL model, which is the Number of Significant Singular Values (NSSV) of a representation matrix. Then, we propose I4R algorithm, to improve DRL algorithms by adding the corresponding regularization term to enhance the NSSV. Finally, we apply I4R to both policy gradient and value based algorithms on Atari games, and the results show the superiority of our proposed method.1,.
UR - https://www.scopus.com/pages/publications/85097342850
M3 - 会议稿件
AN - SCOPUS:85097342850
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 2669
EP - 2675
BT - Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020
A2 - Bessiere, Christian
PB - International Joint Conferences on Artificial Intelligence
Y2 - 1 January 2021
ER -