跳到主要导航 跳到搜索 跳到主要内容

A centroid based text categorization method using Mean Shift

  • Beihang University
  • Research Institute of Beihang in Shenzhen

科研成果: 期刊稿件文章同行评审

摘要

Text categorization is an important research topic in Information Retrieval area and it is one of the key techniques for handling and organizing the huge amount of text data available on the Internet and other digital format in our daily life. In this paper, we propose a method for text categorization based on Mean Shift. Mean Shift algorithm is a well developed technique in computer vision researches. We extend the application of Mean Shift to text categorization by reducing the dimensions of text vector space to a proper scale. Firstly, a low-dimensional feature space is constructed using the feature selection method by the theory of information gain. Secondly, an adaptive Mean Shift algorithm is applied for detecting the centers (centroids) of each category on the feature space above. Finally, each document will be added to its most similar category by calculating the similarities between the document and the center of every category. Experimental results on 20NewsGroup and Rueters-21578 corpus show that this method can achieve higher performance than some classic text categorization method like KNN, Naïve Bayes and SVM. 1548-7741/

源语言英语
页(从-至)4703-4711
页数9
期刊Journal of Information and Computational Science
10
14
DOI
出版状态已出版 - 20 9月 2013

指纹

探究 'A centroid based text categorization method using Mean Shift' 的科研主题。它们共同构成独一无二的指纹。

引用此