Skip to main navigation Skip to search Skip to main content

An active learning method based on mistake sampling for large scale imbalanced classification

  • Jia Guo
  • , Xin Wan*
  • , Hao Lin
  • , Peng Li
  • , Guannan Liu
  • , Yueying He
  • *Corresponding author for this work
  • Beihang University
  • National Computer Network Emergency Response Technical Team

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Nowadays, the challenge of learning from large scale and imbalanced data set have attracted a great deal of attention from both industry and academia, which is also deemed to be an important task for fraud detection in telecommunication, finance, online commerce. In general, it's almost impossible to train a classification model on the complete data set, especially in the era of big data, due to the space-time complexity. Thus, how to sample a training set from the original large-scale set that can provide a more accurate prediction result has become a focal point of study. Active learning provides a way to iteratively add a small batch of data to the initial training set at one time, such that a training set can be augmented with informative samples. However, when tackling with extremely imbalanced data, active learning methods can be invalid. To that end, in this paper, we proposed a novel method to sample the training set based on active learning, in order to solve large scale and imbalanced learning problem. Moreover, we exploit SMOTE, one of the most widely used resampling methods to balance the training set. The experiment was conducted on real world data from the industry of telecommunications. As the result presents, our proposed solution showed a steady and better performance compared to those widely used active learning methods.

Original languageEnglish
Title of host publication14th International Conference on Services Systems and Services Management, ICSSSM 2017 - Proceedings
EditorsXiaoqiang Cai, Jiafu Tang, Jian Chen
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509063697
DOIs
StatePublished - 28 Jul 2017
Event14th International Conference on Services Systems and Services Management, ICSSSM 2017 - Dalian, China
Duration: 16 Jun 201718 Jun 2017

Publication series

Name14th International Conference on Services Systems and Services Management, ICSSSM 2017 - Proceedings

Conference

Conference14th International Conference on Services Systems and Services Management, ICSSSM 2017
Country/TerritoryChina
CityDalian
Period16/06/1718/06/17

Keywords

  • Active learning
  • Fraud detection
  • Imbalanced classification
  • Resampling

Fingerprint

Dive into the research topics of 'An active learning method based on mistake sampling for large scale imbalanced classification'. Together they form a unique fingerprint.

Cite this