TY - JOUR
T1 - Bi-level Hierarchical Neural Contextual Bandits for Online Recommendation
AU - Qi, Yunzhe
AU - Zhou, Yao
AU - Ban, Yikun
AU - Stewart, Allan
AU - Ruan, Chuanwei
AU - He, Jiachuan
AU - Prasad, Shishir Kumar
AU - Wang, Haixun
AU - He, Jingrui
N1 - Publisher Copyright:
© 2026, Transactions on Machine Learning Research. All rights reserved.
PY - 2026
Y1 - 2026
N2 - Contextual bandit algorithms aim to identify the optimal choice among a set of candidate arms, based on their contextual information. Among others, neural contextual bandit algorithms have demonstrated generally superior performance compared to conventional linear and kernel-based methods. Nevertheless, neural methods can be inherently unsuitable for handling a large number of candidate arms due to their high computational cost when performing principled exploration. Motivated by the widespread availability of arm category information (e.g., movie genres, retailer types), we formulate contextual bandits as a bi-level online recommendation problem, and propose a novel neural bandit framework, named H2 N-Bandit, which utilizes a bi-level hierarchical neural architecture to mitigate the substantial computational cost found in conventional neural bandit methods. To demonstrate its theoretical effectiveness, we provide regret analysis under general over-parameterization settings, along with a guarantee for category-level recommendation. To illustrate its effectiveness and efficiency, we conduct extensive experiments on multiple real-world data sets, highlighting that H2 N-Bandit can significantly reduce the computational cost over existing strong non-linear baselines, while achieving better or comparable performance under online recommendation settings.
AB - Contextual bandit algorithms aim to identify the optimal choice among a set of candidate arms, based on their contextual information. Among others, neural contextual bandit algorithms have demonstrated generally superior performance compared to conventional linear and kernel-based methods. Nevertheless, neural methods can be inherently unsuitable for handling a large number of candidate arms due to their high computational cost when performing principled exploration. Motivated by the widespread availability of arm category information (e.g., movie genres, retailer types), we formulate contextual bandits as a bi-level online recommendation problem, and propose a novel neural bandit framework, named H2 N-Bandit, which utilizes a bi-level hierarchical neural architecture to mitigate the substantial computational cost found in conventional neural bandit methods. To demonstrate its theoretical effectiveness, we provide regret analysis under general over-parameterization settings, along with a guarantee for category-level recommendation. To illustrate its effectiveness and efficiency, we conduct extensive experiments on multiple real-world data sets, highlighting that H2 N-Bandit can significantly reduce the computational cost over existing strong non-linear baselines, while achieving better or comparable performance under online recommendation settings.
UR - https://www.scopus.com/pages/publications/105030091974
M3 - 评论/辩论
AN - SCOPUS:105030091974
SN - 2835-8856
VL - 2026 January
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -