Automatic topic detection with an incremental clustering algorithm

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

At present, most of the topic detection approaches are not accurate and efficient enough. In this paper, we proposed a new topic detection method (TPIC) based on an incremental clustering algorithm. It employs a self-refinement process of discriminative feature identification and a term reweighting algorithm to accurately cluster the given documents which discuss the same topic. To be efficient, the "aging" nature of topics is used to precluster stories. To automatically detect the true number of topics, Bayesian Information Criterion (BIC) is used to estimate the true number of topics. Experimental results on Linguistic Data Consortium (LDC) datasets TDT4 show that the proposed method can improve both the efficiency and accuracy, compared to other methods.

Original languageEnglish
Title of host publicationWeb Information Systems and Mining - International Conference, WISM 2010, Proceedings
Pages344-351
Number of pages8
EditionM4D
DOIs
StatePublished - 2010
Event2010 International Conference on Web Information Systems and Mining, WISM 2010 - Sanya, China
Duration: 23 Oct 201024 Oct 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberM4D
Volume6318 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2010 International Conference on Web Information Systems and Mining, WISM 2010
Country/TerritoryChina
CitySanya
Period23/10/1024/10/10

Keywords

  • Incremental clustering
  • TDT
  • Term reweighting
  • Topic Detection

Fingerprint

Dive into the research topics of 'Automatic topic detection with an incremental clustering algorithm'. Together they form a unique fingerprint.

Cite this