TY - GEN
T1 - Constrained Optimization to Improve Critical Rare Classes Performance Within the Top-Ranking Part
AU - Ying, Yuxin
AU - Zhuang, Fuzhen
AU - Liu, Ziyi
AU - Zhu, Dingyuan
AU - Wang, Daixin
AU - Qin, Xiaobo
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - The widespread application of deep learning methods has brought to the challenge of enhancing prediction performance within the highest-score segment of model predictions. In critical domains such as insurance fraud detection and bank cash-out detection, the focus is predominantly on the highest predicted scores, which correspond to high-risk users that need to be intercepted. However, most existing work still focuses on optimizing AUC globally, which often means not being the best within the top-ranking part. Besides, these scenarios often face extreme data imbalance, where the positive samples of interest are in the minority. In this paper, we define the top-ranking optimization problem and propose an Augmented Lagrangian Multiplier method (ALM) based approach to solve it. Specifically, we modify the Discounted Cumulative Gain (DCG) metric to serve as the constraint on top-ranking and add it as the regularization terms to the optimization objective. In addition, to ensure the effectiveness of the regularization term and avoid the overfitting problem, we design a dynamically updated cache mechanism to store the hard samples. Our experimental results on three real-world datasets validate the effectiveness of our proposed method, demonstrating its potential to improve top-ranking prediction performance in imbalanced data settings.
AB - The widespread application of deep learning methods has brought to the challenge of enhancing prediction performance within the highest-score segment of model predictions. In critical domains such as insurance fraud detection and bank cash-out detection, the focus is predominantly on the highest predicted scores, which correspond to high-risk users that need to be intercepted. However, most existing work still focuses on optimizing AUC globally, which often means not being the best within the top-ranking part. Besides, these scenarios often face extreme data imbalance, where the positive samples of interest are in the minority. In this paper, we define the top-ranking optimization problem and propose an Augmented Lagrangian Multiplier method (ALM) based approach to solve it. Specifically, we modify the Discounted Cumulative Gain (DCG) metric to serve as the constraint on top-ranking and add it as the regularization terms to the optimization objective. In addition, to ensure the effectiveness of the regularization term and avoid the overfitting problem, we design a dynamically updated cache mechanism to store the hard samples. Our experimental results on three real-world datasets validate the effectiveness of our proposed method, demonstrating its potential to improve top-ranking prediction performance in imbalanced data settings.
KW - Imbalanced Learning
KW - Insurance Risk Control
KW - Top-ranking Optimization
UR - https://www.scopus.com/pages/publications/105020019217
U2 - 10.1007/978-3-032-05962-8_22
DO - 10.1007/978-3-032-05962-8_22
M3 - 会议稿件
AN - SCOPUS:105020019217
SN - 9783032059611
T3 - Lecture Notes in Computer Science
SP - 372
EP - 388
BT - Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2025, Proceedings
A2 - Ribeiro, Rita P.
A2 - Jorge, Alípio M.
A2 - Soares, Carlos
A2 - Gama, João
A2 - Pfahringer, Bernhard
A2 - Japkowicz, Nathalie
A2 - Larrañaga, Pedro
A2 - Abreu, Pedro H.
PB - Springer Science and Business Media Deutschland GmbH
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025
Y2 - 15 September 2025 through 19 September 2025
ER -