中文    English

Journal of Library and Information Science in Agriculture ›› 2022, Vol. 34 ›› Issue (10): 82-90.doi: 10.13998/j.cnki.issn1002-1248.21-0906

Previous Articles     Next Articles

A Survey of Author Name Disambiguation Techniques of Academic Papers

WANG Xin, LU Yao, YUAN Xue, ZHAO Wanjing, CHEN Li, LIU Minjuan*   

  1. Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing 100081
  • Received:2021-11-22 Online:2022-10-05 Published:2022-11-28

Abstract: [Purpose/Significance] This paper investigates the research on author name disambiguation published in recent years, and reviews the development context of relevant research from the perspective of the impact of data on author name disambiguation methods, so as to provide reference for further research. [Method/Process] The papers related to author name disambiguation were collected from English research databases such as Web of Science, Scopus, Google Academic, ACM Digital Library, IEEE Xplore, ScienceDirect, Scopus and Springer Link, and Chinese research databases such as CNKI, CQVIP and WANFANG. The search results cover the relevant papers published from 1998 to 2021. On the premise of giving consideration to authority, influence and novelty, 46 publicationswere selected for review. There are many types and structures of author name disambiguation data. For example, literature feature information is generally presented in unstructured text, and the extracted features can be stored and represented in two-dimensional tables; Citation information and interpersonal relationship are network relational data, which can be stored and represented by graphs, key value pairs or two-dimensional tables. The fundamental reason for different data structures lies in their semantic differences, but the data structure itself determines its applicable algorithm. According to the structure of characteristic data used in the author name disambiguation task and the different corresponding data processing algorithms, the relevant research is divided into three categories: 1) disambiguation method based on literature characteristics, 2) disambiguation method based on social network and 3) disambiguation method by integrating external knowledge. The impact of data on the author name disambiguation method is examined from the data level. [Results/Conclusions] The analysis found that with the progress of technology, deep learning methods have been widely used. Compared with the improvement of the model, the feature learning and representation based on deep learning can significantly improve the effect of the author name disambiguation algorithm. In addition, in order to overcome the problem of insufficient data utilization by a single method and improve the utilization efficiency of data, the three methods show the trend of mutual combination and complementary gain. From the literature research results, there are few related studies on incremental author name disambiguation and multi-language author name disambiguation, which could be one of the directions for further research.

Key words: knowledge organization, author name disambiguation, person name disambiguation

CLC Number: 

  • G353.1
[1] The STM report: An overview of scientific and scholarly publishing[R/OL].[2020-09-01].https://www.stm-assoc.org/2018_10_04_STM_Report_2018.pdf.
[2] BAGGA A, BALDWIN B.Entity-based cross-document conferencing using the vector space model[C]. Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, 1998: 79-85.
[3] SANYAL D K, BHOWMICK P K, DAS P P.A review of author name disambiguation techniques for the pub med bibliographic database[J]. Journal of information science, 2019(3): 1-28.
[4] 单嵩岩, 吴振新. 面向作者消歧和合作预测领域的作者相似度算法述评[J]. 东北师大学报(自然科学版), 2019, 51(2): 71-80.
SHAN S Y, WU Z X.Review on the author similarity algorithm in the field of author name disambiguation and research collaboration prediction[J]. Journal of northeast normal university(natural science edition), 2019, 51(2): 71-80.
[5] DELGADO A D, MONTALVO S, MARTINEZ-UNANUE R, et al.A survey of person name disambiguation on the web[J]. IEEE access,2018, 6: 59496-59514.
[6] 沈喆, 王毅, 姚毅凡, 等. 面向学术文献的作者名消歧方法研究综述[J]. 数据分析与知识发现, 2020, 4(8) :15-27.
SHEN Z, WANG Y, YAO Y F, et al.Author name disambiguation techniques for academic literature: A review[J]. Data analysis and knowledge discovery, 2020, 4(8): 15-27.
[7] LI S, GAO C, MIAO C.Author name disambiguation using a graph model with node splitting and merging based on bibliographic information[J]. Scientometrics, 2014, 100: 15-50.
[8] CHEN Y, MARTIN J.Towards robust unsupervised personal name disambiguation[C]. Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language leaning(EMNLP-Co NLL), 2007: 190-198.
[9] 章顺瑞, 游宏梁. 基于层次聚类算法的中文人名消歧[J]. 现代图书情报技术, 2010(11): 64-68.
ZHANG S R, YOU H L.Chinese people name disambiguation by hierarchical clustering[J]. New technology of library and information service, 2010(11): 64-68.
[10] 丁海波, 肖桐, 朱靖波. 基于多阶段的中文人名消歧聚类技术的研究[C]. 第六届全国信息检索学术会(CCIR2010), 2010: 316-324.
DING H B, XIAO T, ZHU J B.A multi-stage clustering approach to Chinese person name disambiguation[C]. The 26th China conference on information retrieval(CCIR2010), 2010: 316-324.
[11] WANG X, LIU Y, WANG X, et al.Adaptive resonance theory based two-stage Chinese name disambiguation[J]. International journal, 2012(2): 83-88.
[12] LONG C, SHI L.Web person name disambiguation by relevance weighting of extended feature sets[C]. CLEF(notebook Papers/LABs/Workshops), 2010: 1-13.
[13] FERREIRA A, VELOSO A, GONCALVES M A, et al.Self-training author name disambiguation for information scarce scenarios[J]. Journal of the association for information science and technology, 2014, 65(6): 1257-1278.
[14] PEDERSEN T, PURANDARE A, KULKARNI A.Name discrimination by clustering similar contexts[C]. Computational linguistics & intelligent text processing, international conference, CICLing 2005, Mexico City, Mexico, 2005.
[15] TRAN H N, HUYNH T, DO T.Author name disambiguation by using deep neural network[C]. Asian conference on intelligent information and database systems, Springer, Cham, 2014: 123-132.
[16] 阮光册, 涂世文, 田欣, 等. 多特征融合的英文科技文献增量式人名消歧应用研究[J]. 情报杂志, 2021, 40(9): 147-153.
RUAN G C, TU S W, TIAN X, et al.Application research of incre-mental person name disambiguation in English scientific and techno-logical literature based on multi feature fusion[J]. Journal of intelli-gence, 2021, 40(9): 147-153.
[17] 马莹莹, 吴幼龙, 唐华. 基于特征编码和图嵌入的姓名消歧方法[J]. 中国科学院大学学报, DOI:10 7523/J UCAS 2020 0019.
MA Y Y, WU Y L, TANG H.Name disambiguation based on encod-ing attributes and graph topology[J]. Journal of university of Chinese academy of sciences, DOI:10.7523/j. ucas. 2020.0019.
[18] MALIN B.Unsupervised name disambiguation via social network similarity[C]. Workshop on link analysis, counterterrorism, and security in conjunction with the SIAM intemational conference on data mining, 2005: 93-102.
[19] 郎君, 秦兵, 宋巍,等. 基于社会网络的人名检索结果重名消解[J]. 计算机学报,2009,32(7):1365-1373.
LANG J, QIN B, SONG W, et al.Person name disambiguation of searching results using social network[J]. Chinese journal of computers, 2009, 32(7): 1365-1373.
[20] Yao Y.Name Disambiguation method based on attribute match and link analysis[J]. Journal of software engineering and applications, 2012, 5(1): 29-32.
[21] NADIMI M H, MOSAKHANI M.A more accurate clustering method by using co-author social networks for author name disambiguation[J]. Journal of computing and security, 2015, 1(4): 102-111.
[22] PEROZZI B, AL-RFOU R, SKIENA S.Deep walk: Online learning of social representations[C]. Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, New York, USA, 2014: 701-710.
[23] MIKOLOV T, SUTSKEVER I, CHEN K, et al.Distributed representations of words and phrases and their compositionality[C]. Proceedings of the 26th international conference on neural information processing systems, Lake Tahoe, Nevada, USA, 2013: 3111-3119.
[24] GROVER A, LESKOVEC J.Node2vec: Scalable feature learning for networks[C]. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, California, USA, 2016: 855-864.
[25] 陈丽, 朱裴松, 钱铁云, 等基于边采样的网络表示学习模型[J]. 软件学报, 2018, 29(3): 756-771.
CHEN L, ZHU P S, QIAN T Y, et al.Edge sampling based network embedding model[J]. Journal of software, 2018, 29(3): 756-771.
[26] WANG D, PENG C, ZHU W.Structural deep network embedding[C]. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, California, USA, 2016: 1225-1234.
[27] 刘正铭, 马宏, 刘树新, 等. 一种融合节点文本属性信息的网络表示学习算法[J]. 计算机工程, 2018, 44(11): 165-171.
LIU Z M, MA H, LIU S X, et a1. A network representation learning algorithm fusing with textual attribute information of nodes[J]. Computer engineering, 2018, 44(11): 165-171.
[28] 杨欣欣, 李培峰, 朱巧明. 基于网页文本依存特征的人名消歧[J]. 计算机工程, 2012, 38(19): 133-136.
YANG X X, LI P F, ZHU Q M.Name disambiguation based on de-pendency feature in web page text[J]. Computer engineering, 2012,38(19): 133-136.
[29] HAN X, ZHAO J.Web personal name disambiguation based on refer-ence entity tables mined from the web[C]. Proceeding of the eleventh international workshop on web information and data management, 2009: 75-82.
[30] VU Q M, TAKASU A, ADACHI J.Improving the performance of personal name disambiguation using web directories[J]. Information processing and management, 2008, 44(4): 1546-1561.
[31] SHEN W, WANG J, LUO P, et al.Linking named entities with knowledge base via semantic knowledge[C]. Proceedings of the 21st international conference on the world wide web, 2012: 449-458.
[32] 宁博, 张菲菲. 基于异构知识库的命名实体消歧[J]. 西安邮电大学学报, 2014, 19(4): 2095-6533.
NING B, ZHANG F F.Named entity disambiguation based on het-erogeneous knowledge base[J]. Journal of Xi'an university of posts and telecommunications, 2014, 19(4): 2095-6533.
[33] HAN W, LIU G, MAO Y Z, et al.Attribute based Chinese named entity recognition and disambiguation[C]. The 2nd CIPS-SIGHAN joint conference on Chinese language processing, 2012: 127-131.
[34] PENG Z H, SUN L, HAN X P.A Chinese named entity recognition and disambiguation system using a two-stage method[C]. The 2nd CIPS-SIGHAN joint conference on Chinese language processing, 2012: 115-120.
[35] HE Z, LIU S, LI M, et al.Learning entity representation for entity disambiguation[C]. Proceedings of the 51st annual meeting of the association for computational linguistics, 2013: 30-34.
[36] SUN Y, LIN L, TANG D, et al.Modeling mention, context and entity with neural networks for entity disambiguation[C]. Proceedings of the 24th international conference on artificial intelligence, 2015: 1333-1335.
[37] FRANCIS L M, DURRETT G, KLEIN D.Capturing semantic similarity for entity linking with convolutional neural networks[C]. Proceedings of NAACLHLT, San Diego, 2016: 1256-1261.
[38] CHEN S, WANG J, JIANG F, et al.Improving entity linking by modeling latent entity type information[C]. Proceedings of the AAAI conference on artificial intelligence, New York, 2020: 7529-7537.
[39] GUPTA N, SINGH S, ROTH D.Entity linking via joint encoding of types, descriptions, and context[C]. Proceedings of the 2017 conference on empirical methods in natural language processing, Copenhagen, 2017: 2681-2690.
[40] LE P, TITOV I.Improving entity linking by modeling latent relations between mentions[C]. Proceedings of the 56th annual meeting of the association for computational linguistics, Melbourne, 2018: 1595-1604.
[41] GUO Z, BARBOSA D.Robust named entity disambiguation with random walks[J]. Semantic web, 2018, 9(4): 459-479.
[42] YANG X, GU X, LIN S, et al.Learning dynamic context augmentation for global entity linking[C]. Proceedings of the 2019 conference on empirical methods in natural language, Hong Kong, 2019: 271-281.
[43] LE P, TITOV I.Boosting entity linking performance by leveraging unlabeled documents[C]. Proceedings of the 57th annual meeting of the association for computational linguistics, Florence, 2019: 1935-1945.
[44] LE P, TITOV I.Distant learning for entity linking with automatic noise detection[C]. Proceedings of the 57th annual meeting of the association for computational linguistics, Florence, 2019: 4081-4090.
[45] 吴柯烨, 闵超, 孙建军, 等. 面向特定科研任务的著者姓名消歧方法[J]. 情报学报, 2021, 40(7): 734-744.
WU K Y, MIN C, SUN J J, et al.Method for author name disam-biguation in specific research tasks[J]. Journal of the China society for scientific and technical information, 2021, 40(7): 734-744.
[46] 王若琳, 牛振东, 蔺奇卡, 等. 基于异质信息嵌入与RNN聚类参数预测的作者姓名消歧方法[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
WANG R L, NIU Z D, LIN Q K, et al.Disambiguating author names with embedding heterogeneous information and attentive RNN clustering parameters[J]. Data analysis and knowledge discovery, 2021, 5(8): 13-24.
[47] 郭晨亮, 林欣, 殷珗. 基于异构网络的无监督作者名称消歧[J]. 华东师范大学学报(自然科学版), 2021(6): 147-160.
GUO C L, LIN X, YIN Y.Unsupervised author name disambiguation based on heterogeneous networks[J]. Journal of east China normal university(natural science), 2021(6): 147-160.
[1] SUN Shaodan, DENG Jun, ZHANG Zishu, ZHONG Chuyi, SHENG Panpan. Topic Knowledge Organization of Modern Newspaper Resources by Incorporating the Knowledge Element Concept: Taking the "Shengjing Times" as an Example [J]. Journal of Library and Information Science in Agriculture, 2022, 34(4): 50-62.
[2] SUN Tan, DING Pei, HUANG Yongwen, XIAN Guojian. Review on the Application and Development Strategies of Text Mining in Agriculture Knowledge Services [J]. Journal of Library and Information Science in Agriculture, 2021, 33(1): 4-16.
[3] CHAI Miaoling, HUANG Lin, REN Yunyue. A Review of Construction of Major Agricultural Open Scientific Data Resources [J]. Journal of Library and Information Science in Agriculture, 2020, 32(10): 25-34.
[4] SUN Haixia, LI Junlian, HUA Weina, QIAN Qing. Design and Implementation of Network Collaborative Work Platform for Semantic Interoperability of Science and Technology Knowledge Organization Systems [J]. Agricultural Library and Information, 2019, 31(1): 23-34.
[5] CHEN Qingyun, CAO Jianfei, CHEN Rongzhen. Research and Practices From the Thesaurus to Knowledge Graph [J]. Agricultural Library and Information, 2019, 31(1): 44-53.
[6] CHEN Demin. Research on Information Discovery Service Model of Digital Library Based on Knowledge Organization [J]. , 2018, 30(4): 185-188.
[7] QIN Feifei, CAO Tao, QIAN Zhiyong. Academic Integrity Knowledge Organization Integrated with Online Teaching of Information Literacy [J]. , 2017, 29(12): 120-126.
[8] LIU Zhao-wei. Research of Personalized Literature Retrieval mode Based on User’s Demand [J]. , 2016, 28(6): 158-161.
[9] REN Wei. The Research of Knowledge Management Strategy Based on Crisis of Marginalization [J]. , 2015, 27(6): 5-9.
[10] Library, Xi’an University of Technology, Xi’an 710048, China. Practice and Exiting Problems of Chinese Book Purchase for University Libraries [J]. , 2014, 26(3): 92-95.
[11] YANG Xiao-ling. Construction of the Mode of Library Reference Service based on Knowledge Organization [J]. , 2014, 26(2): 182-184.
[12] MA Xin-yan. Study on Knowledge Organization of Library Professional Virtual Communities Based on Ontology [J]. , 2014, 26(1): 40-43.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!