跳到主要导航 跳到搜索 跳到主要内容

Research of automatic topic detection based on incremental clustering

科研成果: 期刊稿件文章同行评审

摘要

With the exponential growth of information on the Internet, it has become increasingly difficult to find and organize relevant material. Topic detection and tracking (TDT) is a research area addressing this problem. As one of the basic tasks of TDT, topic detection is the problem of grouping all stories, based on the topics they discuss. This paper proposes a new topic detection method (TPIC) based on an incremental clustering algorithm. The proposed topic detection strives to achieve a high accuracy and the capability of estimating the true number of topics in the document corpus. Term reweighing algorithm is used to accurately and efficiently cluster the given document corpus, and a self-refinement process of discriminative feature identification is proposed to improve the performance of clustering. Furthermore, topics' "aging" nature is used to precluster stories, and Bayesian information criterion (BIC) is used to estimate the true number of topics. Experimental results on linguistic data consortium (LDC) datasets TDT-4 show that the proposed model can improve both efficiency and accuracy, compared to other models.

源语言英语
页(从-至)1578-1587
页数10
期刊Ruan Jian Xue Bao/Journal of Software
23
6
DOI
出版状态已出版 - 6月 2012

指纹

探究 'Research of automatic topic detection based on incremental clustering' 的科研主题。它们共同构成独一无二的指纹。

引用此