农业图书情报学报 ›› 2022, Vol. 34 ›› Issue (10): 82-90.doi: 10.13998/j.cnki.issn1002-1248.21-0906

• 研究论文 • 上一篇    下一篇

学术论文作者同名消歧方法研究进展

王新, 卢垚, 袁雪, 赵婉婧, 陈莉, 刘敏娟*   

  1. 中国农业科学院农业信息研究所,北京 100081
  • 收稿日期:2021-11-22 出版日期:2022-10-05 发布日期:2022-11-28
  • 通讯作者: * 刘敏娟,副研究馆员,研究方向为信息资源组织与管理。E-mail:liuminjuan@caas.cn
  • 作者简介:王新,馆员,研究方向为数字资源管理、信息组织。卢垚,副研究员,研究方向为农业信息资源建设与组织。袁雪,馆员,研究方向为信息资源组织与管理。赵婉婧,助理研究员,研究方向为信息组织技术与实践。陈莉,馆员,研究方向为信息资源组织
  • 基金资助:
    中国农业科学院农业信息研究所2022年科技创新工程“数字农科院3.0建设”(CAAS-ASTIP-2016-AII)

A Survey of Author Name Disambiguation Techniques of Academic Papers

WANG Xin, LU Yao, YUAN Xue, ZHAO Wanjing, CHEN Li, LIU Minjuan*   

  1. Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing 100081
  • Received:2021-11-22 Online:2022-10-05 Published:2022-11-28

摘要: [目的/意义]调研近年来作者同名消歧相关研究,厘清发展脉络,为后续研究提供参考。[方法/过程]使用Web of Science、Scopus、谷歌学术、ACM、IEEE、Elsevier、Springer、中国知网、维普数据库和万方数据库检索作者姓名消歧相关文献,选择其中46篇代表性文献进行综述。从数据对作者同名消歧方法的影响的角度审视、梳理相关研究的发展脉络。[结果/结论]按照消歧任务所依据的数据特点将相关研究方法分为3类。随着技术的进步,深度学习方法得到广泛采用。相对于模型的改进,基于深度学习的特征学习和表示,对作者同名消歧算法效果的提高更为显著,同时,为充分利用数据中包含的各种信息,3类算法呈现出相互结合、互补增益的态势。从文献调研情况看,可以从增量消歧和跨语种消歧等角度开展后续研究。

关键词: 知识组织, 作者名消歧, 人名消歧

Abstract: [Purpose/Significance] This paper investigates the research on author name disambiguation published in recent years, and reviews the development context of relevant research from the perspective of the impact of data on author name disambiguation methods, so as to provide reference for further research. [Method/Process] The papers related to author name disambiguation were collected from English research databases such as Web of Science, Scopus, Google Academic, ACM Digital Library, IEEE Xplore, ScienceDirect, Scopus and Springer Link, and Chinese research databases such as CNKI, CQVIP and WANFANG. The search results cover the relevant papers published from 1998 to 2021. On the premise of giving consideration to authority, influence and novelty, 46 publicationswere selected for review. There are many types and structures of author name disambiguation data. For example, literature feature information is generally presented in unstructured text, and the extracted features can be stored and represented in two-dimensional tables; Citation information and interpersonal relationship are network relational data, which can be stored and represented by graphs, key value pairs or two-dimensional tables. The fundamental reason for different data structures lies in their semantic differences, but the data structure itself determines its applicable algorithm. According to the structure of characteristic data used in the author name disambiguation task and the different corresponding data processing algorithms, the relevant research is divided into three categories: 1) disambiguation method based on literature characteristics, 2) disambiguation method based on social network and 3) disambiguation method by integrating external knowledge. The impact of data on the author name disambiguation method is examined from the data level. [Results/Conclusions] The analysis found that with the progress of technology, deep learning methods have been widely used. Compared with the improvement of the model, the feature learning and representation based on deep learning can significantly improve the effect of the author name disambiguation algorithm. In addition, in order to overcome the problem of insufficient data utilization by a single method and improve the utilization efficiency of data, the three methods show the trend of mutual combination and complementary gain. From the literature research results, there are few related studies on incremental author name disambiguation and multi-language author name disambiguation, which could be one of the directions for further research.

Key words: knowledge organization, author name disambiguation, person name disambiguation

中图分类号: 

  • G353.1

引用本文

王新, 卢垚, 袁雪, 赵婉婧, 陈莉, 刘敏娟. 学术论文作者同名消歧方法研究进展[J]. 农业图书情报学报, 2022, 34(10): 82-90.

WANG Xin, LU Yao, YUAN Xue, ZHAO Wanjing, CHEN Li, LIU Minjuan. A Survey of Author Name Disambiguation Techniques of Academic Papers[J]. Journal of Library and Information Science in Agriculture, 2022, 34(10): 82-90.