跳到主要导航 跳到搜索 跳到主要内容

A semantics-based method for clustering of Chinese web search results

  • Indiana University-Purdue University Fort Wayne
  • Beihang University

科研成果: 期刊稿件文章同行评审

摘要

Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets' semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.

源语言英语
页(从-至)147-165
页数19
期刊Enterprise Information Systems
8
1
DOI
出版状态已出版 - 2014

指纹

探究 'A semantics-based method for clustering of Chinese web search results' 的科研主题。它们共同构成独一无二的指纹。

引用此