TY - GEN
T1 - DIAS
T2 - SIAM International Conference on Data Mining 2015, SDM 2015
AU - Liu, Hongfu
AU - Wu, Junjie
AU - Tao, Dacheng
AU - Zhang, Yuchao
AU - Fu, Yun
N1 - Publisher Copyright:
Copyright © SIAM.
PY - 2015
Y1 - 2015
N2 - Upon extensive studies, text clustering remains a critical challenge in data mining community. Even by various techniques proposed to overcome some of these challenges, there still exist problems when dealing with weakly related or even noisy features. In response to this, we propose a Dlssemble-ASsemble (DIAS) framework for text clustering. DIAS employs simple random feature sampling to disassemble high-dimensional text data and gains diverse structural knowledge. This also does good to avoiding the bulk of noisy features. Then the multi-view knowledge is assembled by weighted Information-theoretic Consensus Clustering (IC-C) in order to gain a high-quality consensus partitioning. Extensive experiments on eight real-world text data sets demonstrate the advantages of DIAS over other widely used methods. In particular, DIAS shows strengths in learning from very weak basic partitionings. In addition, it is the natural suitability to distributed computing that makes DIAS become a promising candidate for big text clustering.
AB - Upon extensive studies, text clustering remains a critical challenge in data mining community. Even by various techniques proposed to overcome some of these challenges, there still exist problems when dealing with weakly related or even noisy features. In response to this, we propose a Dlssemble-ASsemble (DIAS) framework for text clustering. DIAS employs simple random feature sampling to disassemble high-dimensional text data and gains diverse structural knowledge. This also does good to avoiding the bulk of noisy features. Then the multi-view knowledge is assembled by weighted Information-theoretic Consensus Clustering (IC-C) in order to gain a high-quality consensus partitioning. Extensive experiments on eight real-world text data sets demonstrate the advantages of DIAS over other widely used methods. In particular, DIAS shows strengths in learning from very weak basic partitionings. In addition, it is the natural suitability to distributed computing that makes DIAS become a promising candidate for big text clustering.
UR - https://www.scopus.com/pages/publications/84961956889
U2 - 10.1137/1.9781611974010.86
DO - 10.1137/1.9781611974010.86
M3 - 会议稿件
AN - SCOPUS:84961956889
T3 - SIAM International Conference on Data Mining 2015, SDM 2015
SP - 766
EP - 774
BT - SIAM International Conference on Data Mining 2015, SDM 2015
A2 - Venkatasubramanian, Suresh
A2 - Ye, Jieping
PB - Society for Industrial and Applied Mathematics Publications
Y2 - 30 April 2015 through 2 May 2015
ER -