摘要
Text clustering has been widely used in many Natural Language Processing (NLP) applications such as text summarization and news recommendation. However, most of the current algorithms need to predefine a clustering number, which is difficult to obtain. Moreover, the mutli-label clustering is useful in multiple clustering tasks in many applications, but related works are rarely available. Although several studies have attempted to solve above two problems, there is a need for methods that can solve the two issues simultaneously. Therefore, we propose a new text clustering algorithm called Word2Cluster. Word2Cluster can automatically generate an adaptive number of clusters and support multi-label clustering. To test the performance of Wrod2Cluster, we build a Chinese text dataset, Hotline, according to real world applications. To evaluate the clustering results better, we propose an improved evaluation method based on basic accuracy, precision and recall for multi-label text clustering. Experimental results on a Chinese text dataset (Hotline) and a public English text dataset (Reuters) demonstrate that our algorithm can achieve better F1-measure and runs faster than the state-of- the-art baselines.
| 源语言 | 英语 |
|---|---|
| 文章编号 | 9013266 |
| 期刊 | Proceedings - IEEE Global Communications Conference, GLOBECOM |
| DOI | |
| 出版状态 | 已出版 - 2019 |
| 活动 | 2019 IEEE Global Communications Conference, GLOBECOM 2019 - Waikoloa, 美国 期限: 9 12月 2019 → 13 12月 2019 |
指纹
探究 'Word2Cluster: A new multi-label text clustering algorithm with an adaptive clusters number' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver