Skip to main navigation Skip to search Skip to main content

Modeling both coarse-grained and fine-grained topics in massive text data

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Topic model has attracted much attention from investigators, as it provides users with insights into the huge volumes of documents. However, most previous related studies that based on Non-negative Matrix Factorization (NMF) neglect to figure out which topics are widespread in the documents and which are not. These widespread topics, which we refer to coarse-grained topics, have great significance for people who concentrate on common topics in a given text set. For example, after reading the massive job ads, the jobseekers are eager to learn employers' basic requirements which can be regarded as the coarse-grained topics, as well as the additional requirements that can be deemed to be the fine-grained topics. In this paper, we propose a novel method which applies two different sparseness constraints to NMF to tell coarse-grained topics and fine-grained topics apart. The experimental results of demonstrate that the new model can not only discover coarse-grained topics but also extract fine-grained topics. We evaluate the performance of the new model via text clustering and classification, and the results show the new model can learn more accurate topic representations of documents.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE 1st International Conference on Big Data Computing Service and Applications, BigDataService 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages378-383
Number of pages6
ISBN (Electronic)9781479981281
DOIs
StatePublished - 10 Aug 2015
Event1st IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2015 - San Francisco, United States
Duration: 30 Mar 20153 Apr 2015

Publication series

NameProceedings - 2015 IEEE 1st International Conference on Big Data Computing Service and Applications, BigDataService 2015

Conference

Conference1st IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2015
Country/TerritoryUnited States
CitySan Francisco
Period30/03/153/04/15

Keywords

  • non-negative matrix factorization
  • text clustering
  • text mining
  • topic model

Fingerprint

Dive into the research topics of 'Modeling both coarse-grained and fine-grained topics in massive text data'. Together they form a unique fingerprint.

Cite this