Skip to main navigation Skip to search Skip to main content

Research of automatic topic detection based on incremental clustering

Research output: Contribution to journalArticlepeer-review

Abstract

With the exponential growth of information on the Internet, it has become increasingly difficult to find and organize relevant material. Topic detection and tracking (TDT) is a research area addressing this problem. As one of the basic tasks of TDT, topic detection is the problem of grouping all stories, based on the topics they discuss. This paper proposes a new topic detection method (TPIC) based on an incremental clustering algorithm. The proposed topic detection strives to achieve a high accuracy and the capability of estimating the true number of topics in the document corpus. Term reweighing algorithm is used to accurately and efficiently cluster the given document corpus, and a self-refinement process of discriminative feature identification is proposed to improve the performance of clustering. Furthermore, topics' "aging" nature is used to precluster stories, and Bayesian information criterion (BIC) is used to estimate the true number of topics. Experimental results on linguistic data consortium (LDC) datasets TDT-4 show that the proposed model can improve both efficiency and accuracy, compared to other models.

Original languageEnglish
Pages (from-to)1578-1587
Number of pages10
JournalRuan Jian Xue Bao/Journal of Software
Volume23
Issue number6
DOIs
StatePublished - Jun 2012

Keywords

  • Incremental clustering
  • Reweighting
  • TDT
  • Topic detection
  • Topic detection and tracking

Fingerprint

Dive into the research topics of 'Research of automatic topic detection based on incremental clustering'. Together they form a unique fingerprint.

Cite this