Skip to main navigation Skip to search Skip to main content

Local decomposition for rare class analysis

  • Junjie Wu*
  • , Peng Wu
  • , Jian Chen
  • , Hui Xiong
  • *Corresponding author for this work
  • Tsinghua University
  • Rutgers University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Given its importance, the problem of predicting rare classes in large-scale multi-labeled data sets has attracted great attentions in the literature. However, the rare-class problem remains a critical challenge, because there is no natural way developed for handling imbalanced class distributions. This paper thus fills this crucial void by developing a method for Classification using lOcal clusterinG (COG). Specifically, for a data set with an imbalanced class distribution, we perform clustering within each large class and produce sub-classes with relatively balanced sizes. Then, we apply traditional supervised learning algorithms, such as Support Vector Machines (SVMs), for classification. Indeed, our experimental results on various real-world data sets show that our method produces significantly higher prediction accuracies on rare classes than state-of-the-art methods. Furthermore, we show that COG can also improve the performance of traditional supervised learning algorithms on data sets with balanced class distributions.

Original languageEnglish
Title of host publicationKDD-2007
Subtitle of host publicationProceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages814-823
Number of pages10
DOIs
StatePublished - 2007
Externally publishedYes
EventKDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - San Jose, CA, United States
Duration: 12 Aug 200715 Aug 2007

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

ConferenceKDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Country/TerritoryUnited States
CitySan Jose, CA
Period12/08/0715/08/07

Keywords

  • K-means clustering support vector machines
  • Local clustering
  • Rare class analysis

Fingerprint

Dive into the research topics of 'Local decomposition for rare class analysis'. Together they form a unique fingerprint.

Cite this