跳到主要导航 跳到搜索 跳到主要内容

Word2Cluster: A new multi-label text clustering algorithm with an adaptive clusters number

  • Beihang University
  • Guangzhou University
  • China Aerospace Science and Industry Corporation

科研成果: 期刊稿件会议文章同行评审

摘要

Text clustering has been widely used in many Natural Language Processing (NLP) applications such as text summarization and news recommendation. However, most of the current algorithms need to predefine a clustering number, which is difficult to obtain. Moreover, the mutli-label clustering is useful in multiple clustering tasks in many applications, but related works are rarely available. Although several studies have attempted to solve above two problems, there is a need for methods that can solve the two issues simultaneously. Therefore, we propose a new text clustering algorithm called Word2Cluster. Word2Cluster can automatically generate an adaptive number of clusters and support multi-label clustering. To test the performance of Wrod2Cluster, we build a Chinese text dataset, Hotline, according to real world applications. To evaluate the clustering results better, we propose an improved evaluation method based on basic accuracy, precision and recall for multi-label text clustering. Experimental results on a Chinese text dataset (Hotline) and a public English text dataset (Reuters) demonstrate that our algorithm can achieve better F1-measure and runs faster than the state-of- the-art baselines.

源语言英语
文章编号9013266
期刊Proceedings - IEEE Global Communications Conference, GLOBECOM
DOI
出版状态已出版 - 2019
活动2019 IEEE Global Communications Conference, GLOBECOM 2019 - Waikoloa, 美国
期限: 9 12月 201913 12月 2019

指纹

探究 'Word2Cluster: A new multi-label text clustering algorithm with an adaptive clusters number' 的科研主题。它们共同构成独一无二的指纹。

引用此