摘要
Text categorization is an important research topic in Information Retrieval area and it is one of the key techniques for handling and organizing the huge amount of text data available on the Internet and other digital format in our daily life. In this paper, we propose a method for text categorization based on Mean Shift. Mean Shift algorithm is a well developed technique in computer vision researches. We extend the application of Mean Shift to text categorization by reducing the dimensions of text vector space to a proper scale. Firstly, a low-dimensional feature space is constructed using the feature selection method by the theory of information gain. Secondly, an adaptive Mean Shift algorithm is applied for detecting the centers (centroids) of each category on the feature space above. Finally, each document will be added to its most similar category by calculating the similarities between the document and the center of every category. Experimental results on 20NewsGroup and Rueters-21578 corpus show that this method can achieve higher performance than some classic text categorization method like KNN, Naïve Bayes and SVM. 1548-7741/
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 4703-4711 |
| 页数 | 9 |
| 期刊 | Journal of Information and Computational Science |
| 卷 | 10 |
| 期 | 14 |
| DOI | |
| 出版状态 | 已出版 - 20 9月 2013 |
指纹
探究 'A centroid based text categorization method using Mean Shift' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver