跳到主要导航 跳到搜索 跳到主要内容

An active learning method based on mistake sampling for large scale imbalanced classification

  • Beihang University
  • National Computer Network Emergency Response Technical Team

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Nowadays, the challenge of learning from large scale and imbalanced data set have attracted a great deal of attention from both industry and academia, which is also deemed to be an important task for fraud detection in telecommunication, finance, online commerce. In general, it's almost impossible to train a classification model on the complete data set, especially in the era of big data, due to the space-time complexity. Thus, how to sample a training set from the original large-scale set that can provide a more accurate prediction result has become a focal point of study. Active learning provides a way to iteratively add a small batch of data to the initial training set at one time, such that a training set can be augmented with informative samples. However, when tackling with extremely imbalanced data, active learning methods can be invalid. To that end, in this paper, we proposed a novel method to sample the training set based on active learning, in order to solve large scale and imbalanced learning problem. Moreover, we exploit SMOTE, one of the most widely used resampling methods to balance the training set. The experiment was conducted on real world data from the industry of telecommunications. As the result presents, our proposed solution showed a steady and better performance compared to those widely used active learning methods.

源语言英语
主期刊名14th International Conference on Services Systems and Services Management, ICSSSM 2017 - Proceedings
编辑Xiaoqiang Cai, Jiafu Tang, Jian Chen
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9781509063697
DOI
出版状态已出版 - 28 7月 2017
活动14th International Conference on Services Systems and Services Management, ICSSSM 2017 - Dalian, 中国
期限: 16 6月 201718 6月 2017

出版系列

姓名14th International Conference on Services Systems and Services Management, ICSSSM 2017 - Proceedings

会议

会议14th International Conference on Services Systems and Services Management, ICSSSM 2017
国家/地区中国
Dalian
时期16/06/1718/06/17

指纹

探究 'An active learning method based on mistake sampling for large scale imbalanced classification' 的科研主题。它们共同构成独一无二的指纹。

引用此