TY - GEN
T1 - UNO! UNified Offline Training Paradigm for Learning Path Recommendation
AU - Peng, Linzhi
AU - Zhu, Wentao
AU - Cheng, Ke
AU - Chang, Heng
AU - Ye, Junchen
AU - Du, Bowen
AU - Lv, Weifeng
N1 - Publisher Copyright:
© 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2026
Y1 - 2026
N2 - With the wide adoption of online education platforms, adaptive learning systems have become increasingly important. Learning Path Recommendation (LPR) aims to dynamically adjust learning content to optimize learning efficiency based on individual student needs. However, current LPR methods suffer from sparse reward for precise assessment and only focus on anonymous sessions that overlook more personalized and effective paths. To address these challenges, we propose UNO, UNified Offline Training Paradigm for Learning Path Recommendation. This approach introduces an offline training paradigm in reinforcement learning based LPR to provide dense process rewards by a personalized advantage based on a reward model, which can estimate the students’ internal knowledge levels on the learning targets. Additionally, we propose UniLPR model, a personalized recommendation system that unifies modeling the implicit relationships between students’ long-term accumulation and evolving requirements for questions, and refines through Group Relative Policy Op-timization(GRPO). Finally, we design learning tasks that encompass historical reviewing, recent learning, and long-term exploratory learning to simulate the comprehensive and diverse needs of students. Our UNO achieves state-of-the-art performance across all tasks, demonstrating its effectiveness.
AB - With the wide adoption of online education platforms, adaptive learning systems have become increasingly important. Learning Path Recommendation (LPR) aims to dynamically adjust learning content to optimize learning efficiency based on individual student needs. However, current LPR methods suffer from sparse reward for precise assessment and only focus on anonymous sessions that overlook more personalized and effective paths. To address these challenges, we propose UNO, UNified Offline Training Paradigm for Learning Path Recommendation. This approach introduces an offline training paradigm in reinforcement learning based LPR to provide dense process rewards by a personalized advantage based on a reward model, which can estimate the students’ internal knowledge levels on the learning targets. Additionally, we propose UniLPR model, a personalized recommendation system that unifies modeling the implicit relationships between students’ long-term accumulation and evolving requirements for questions, and refines through Group Relative Policy Op-timization(GRPO). Finally, we design learning tasks that encompass historical reviewing, recent learning, and long-term exploratory learning to simulate the comprehensive and diverse needs of students. Our UNO achieves state-of-the-art performance across all tasks, demonstrating its effectiveness.
UR - https://www.scopus.com/pages/publications/105034599040
U2 - 10.1609/aaai.v40i18.38591
DO - 10.1609/aaai.v40i18.38591
M3 - 会议稿件
AN - SCOPUS:105034599040
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 15617
EP - 15625
BT - Proceedings of the AAAI Conference on Artificial Intelligence
A2 - Koenig, Sven
A2 - Jenkins, Chad
A2 - Taylor, Matthew E.
PB - Association for the Advancement of Artificial Intelligence
T2 - 40th AAAI Conference on Artificial Intelligence, AAAI 2026
Y2 - 20 January 2026 through 27 January 2026
ER -