DIAS: A disassemble-assemble framework for highly sparse text clustering

  • Hongfu Liu
  • , Junjie Wu
  • , Dacheng Tao
  • , Yuchao Zhang
  • , Yun Fu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Upon extensive studies, text clustering remains a critical challenge in data mining community. Even by various techniques proposed to overcome some of these challenges, there still exist problems when dealing with weakly related or even noisy features. In response to this, we propose a Dlssemble-ASsemble (DIAS) framework for text clustering. DIAS employs simple random feature sampling to disassemble high-dimensional text data and gains diverse structural knowledge. This also does good to avoiding the bulk of noisy features. Then the multi-view knowledge is assembled by weighted Information-theoretic Consensus Clustering (IC-C) in order to gain a high-quality consensus partitioning. Extensive experiments on eight real-world text data sets demonstrate the advantages of DIAS over other widely used methods. In particular, DIAS shows strengths in learning from very weak basic partitionings. In addition, it is the natural suitability to distributed computing that makes DIAS become a promising candidate for big text clustering.

Original languageEnglish
Title of host publicationSIAM International Conference on Data Mining 2015, SDM 2015
EditorsSuresh Venkatasubramanian, Jieping Ye
PublisherSociety for Industrial and Applied Mathematics Publications
Pages766-774
Number of pages9
ISBN (Electronic)9781510811522
DOIs
StatePublished - 2015
EventSIAM International Conference on Data Mining 2015, SDM 2015 - Vancouver, Canada
Duration: 30 Apr 20152 May 2015

Publication series

NameSIAM International Conference on Data Mining 2015, SDM 2015

Conference

ConferenceSIAM International Conference on Data Mining 2015, SDM 2015
Country/TerritoryCanada
CityVancouver
Period30/04/152/05/15

Fingerprint

Dive into the research topics of 'DIAS: A disassemble-assemble framework for highly sparse text clustering'. Together they form a unique fingerprint.

Cite this