Name Disambiguation for Chinese Scientific Authors with Multi-Level Clustering

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Name ambiguity arises when one retrieves publications written by distinct author entities who share the same name. During the past decades, a myriad of supervised and unsupervised methods have been proposed to resolve the ambiguity of author names and boost retrieval performance for digital libraries and other similar websites such as Citeseer1 and WanFang2. However, most of them either need large numbers of annotated data or cannot be scaled to suit massive data sets. In this paper, we propose an entirely unsupervised framework to achieve well-performed disambiguation, specifically, a multilevel clustering algorithm that builds a discipline tree in which paper and author entities are matched. To speed up the process of constructing discipline tree, we implement an efficient seeding algorithm for sequential k-Means and design a strategy for fast estimating k, the number of clusters. Experimental results show our framework is efficient for large scale data sets and works well for name disambiguation with respect to Chinese scientific authors.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Computational Science and Engineering and IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, CSE and EUC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages176-182
Number of pages7
ISBN (Electronic)9781538632215
DOIs
StatePublished - 8 Aug 2017
Event20th IEEE International Conference on Computational Science and Engineering and 15th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, CSE and EUC 2017 - Guangzhou, Guangdong, China
Duration: 21 Jul 201724 Jul 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Computational Science and Engineering and IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, CSE and EUC 2017
Volume1

Conference

Conference20th IEEE International Conference on Computational Science and Engineering and 15th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, CSE and EUC 2017
Country/TerritoryChina
CityGuangzhou, Guangdong
Period21/07/1724/07/17

Keywords

  • Automatic Name Disambiguation
  • Multi-level Clustering
  • Scientific Databases
  • Unsupervised Learning

Fingerprint

Dive into the research topics of 'Name Disambiguation for Chinese Scientific Authors with Multi-Level Clustering'. Together they form a unique fingerprint.

Cite this