TY - GEN
T1 - Name Disambiguation for Chinese Scientific Authors with Multi-Level Clustering
AU - Sun, Simeng
AU - Zhang, Hui
AU - Li, Ning
AU - Chen, Yong
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/8/8
Y1 - 2017/8/8
N2 - Name ambiguity arises when one retrieves publications written by distinct author entities who share the same name. During the past decades, a myriad of supervised and unsupervised methods have been proposed to resolve the ambiguity of author names and boost retrieval performance for digital libraries and other similar websites such as Citeseer1 and WanFang2. However, most of them either need large numbers of annotated data or cannot be scaled to suit massive data sets. In this paper, we propose an entirely unsupervised framework to achieve well-performed disambiguation, specifically, a multilevel clustering algorithm that builds a discipline tree in which paper and author entities are matched. To speed up the process of constructing discipline tree, we implement an efficient seeding algorithm for sequential k-Means and design a strategy for fast estimating k, the number of clusters. Experimental results show our framework is efficient for large scale data sets and works well for name disambiguation with respect to Chinese scientific authors.
AB - Name ambiguity arises when one retrieves publications written by distinct author entities who share the same name. During the past decades, a myriad of supervised and unsupervised methods have been proposed to resolve the ambiguity of author names and boost retrieval performance for digital libraries and other similar websites such as Citeseer1 and WanFang2. However, most of them either need large numbers of annotated data or cannot be scaled to suit massive data sets. In this paper, we propose an entirely unsupervised framework to achieve well-performed disambiguation, specifically, a multilevel clustering algorithm that builds a discipline tree in which paper and author entities are matched. To speed up the process of constructing discipline tree, we implement an efficient seeding algorithm for sequential k-Means and design a strategy for fast estimating k, the number of clusters. Experimental results show our framework is efficient for large scale data sets and works well for name disambiguation with respect to Chinese scientific authors.
KW - Automatic Name Disambiguation
KW - Multi-level Clustering
KW - Scientific Databases
KW - Unsupervised Learning
UR - https://www.scopus.com/pages/publications/85034633828
U2 - 10.1109/CSE-EUC.2017.39
DO - 10.1109/CSE-EUC.2017.39
M3 - 会议稿件
AN - SCOPUS:85034633828
T3 - Proceedings - 2017 IEEE International Conference on Computational Science and Engineering and IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, CSE and EUC 2017
SP - 176
EP - 182
BT - Proceedings - 2017 IEEE International Conference on Computational Science and Engineering and IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, CSE and EUC 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th IEEE International Conference on Computational Science and Engineering and 15th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, CSE and EUC 2017
Y2 - 21 July 2017 through 24 July 2017
ER -