TY - GEN
T1 - Modeling both coarse-grained and fine-grained topics in massive text data
AU - Zhang, Weifan
AU - Zhang, Hui
AU - Zuo, Yuan
AU - Wang, Deqing
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/8/10
Y1 - 2015/8/10
N2 - Topic model has attracted much attention from investigators, as it provides users with insights into the huge volumes of documents. However, most previous related studies that based on Non-negative Matrix Factorization (NMF) neglect to figure out which topics are widespread in the documents and which are not. These widespread topics, which we refer to coarse-grained topics, have great significance for people who concentrate on common topics in a given text set. For example, after reading the massive job ads, the jobseekers are eager to learn employers' basic requirements which can be regarded as the coarse-grained topics, as well as the additional requirements that can be deemed to be the fine-grained topics. In this paper, we propose a novel method which applies two different sparseness constraints to NMF to tell coarse-grained topics and fine-grained topics apart. The experimental results of demonstrate that the new model can not only discover coarse-grained topics but also extract fine-grained topics. We evaluate the performance of the new model via text clustering and classification, and the results show the new model can learn more accurate topic representations of documents.
AB - Topic model has attracted much attention from investigators, as it provides users with insights into the huge volumes of documents. However, most previous related studies that based on Non-negative Matrix Factorization (NMF) neglect to figure out which topics are widespread in the documents and which are not. These widespread topics, which we refer to coarse-grained topics, have great significance for people who concentrate on common topics in a given text set. For example, after reading the massive job ads, the jobseekers are eager to learn employers' basic requirements which can be regarded as the coarse-grained topics, as well as the additional requirements that can be deemed to be the fine-grained topics. In this paper, we propose a novel method which applies two different sparseness constraints to NMF to tell coarse-grained topics and fine-grained topics apart. The experimental results of demonstrate that the new model can not only discover coarse-grained topics but also extract fine-grained topics. We evaluate the performance of the new model via text clustering and classification, and the results show the new model can learn more accurate topic representations of documents.
KW - non-negative matrix factorization
KW - text clustering
KW - text mining
KW - topic model
UR - https://www.scopus.com/pages/publications/84959492021
U2 - 10.1109/BigDataService.2015.21
DO - 10.1109/BigDataService.2015.21
M3 - 会议稿件
AN - SCOPUS:84959492021
T3 - Proceedings - 2015 IEEE 1st International Conference on Big Data Computing Service and Applications, BigDataService 2015
SP - 378
EP - 383
BT - Proceedings - 2015 IEEE 1st International Conference on Big Data Computing Service and Applications, BigDataService 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2015
Y2 - 30 March 2015 through 3 April 2015
ER -