农业图书情报学报 ›› 2022, Vol. 34 ›› Issue (3): 15-27.doi: 10.13998/j.cnki.issn1002-1248.21-0707

• 特约文章 • 上一篇    下一篇

基于深度学习的科技文献摘要结构功能识别研究

毛进1,2, 陈子洋1,2   

  1. 1.武汉大学 信息资源研究中心,武汉 430072;
    2.武汉大学 信息管理学院,武汉 430072
  • 收稿日期:2021-09-17 出版日期:2022-03-05 发布日期:2022-04-27
  • 作者简介:毛进,副教授,硕士生导师,研究方向为信息组织、大数据分析。陈子洋,硕士研究生,研究方向为信息组织
  • 基金资助:
    国家自然科学基金青年项目“基于学术异质网络表示学习的知识群落发现”(71804135); 国家自然科学基金面上项目“基于‘问题-方法’关联识别的科学知识创新探测与协同演化分析”(72174154)

A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts

MAO Jin1,2, CHEN Ziyang1,2   

  1. 1. Center for Studies of Information Resources, Wuhan University, Wuhan 430072;
    2. School of Information Management, Wuhan University, Wuhan 430072
  • Received:2021-09-17 Online:2022-03-05 Published:2022-04-27

摘要: [目的/意义]科技文献摘要往往由承担特定功能的部分构成,利用深度学习对科技文献摘要结构功能进行识别有助于实现科技文献文本深度分析。[方法/过程]本文将科技文献摘要特征功能识别任务转换为文本分类问题,将结构功能分为“引言-方法-结果-结论(Introduction-Methods-Results-Conclusions,IMRC)”4类,基于摘要句内容及其上下文特征,利用BERT、BERT-BiLSTM、BERT-TextCNN、ERNIE等模型构建分类器,实现摘要结构功能自动识别。[结果/结论]在eHealth领域3 130篇文献数据集上开展实验,结果表明:ERNIE模型的各项指标均高于其他模型,BERT-TextCNN模型在短句子上效果更好,而BERT-BiLSTM模型对于长句子的识别效果更好。本研究有助于实现科技文献摘要文本的细粒度功能理解,对文献结构的解析能够服务于科技文献深度挖掘和基于文献的知识发现。

关键词: 深度学习, BERT, 文献结构, 功能识别, 文本分类

Abstract: [Purpose/Significance] Abstracts of scientific documents are often composed of sections with specific functions. Using the deep learning method to identify structural functions of abstracts of scientific documents is conducive to the in-depth analysis of the documents. [Method/Process] In this paper, identifying structural functions of abstracts of scientific documents is transformed into a text classification problem, and its structure functions are divided into four categories: "introduction, methods, results, conclusions (IMRC)". Based on the text content and context features of abstract sentences, the classifier is constructed based on deep learning models such as BERT, BERT-BiLSTM, BERT-TextCNN and ERNIE, to automatically identify structural functions of abstracts of scientific documents. [Results/Conclusions] Experiments are carried out on a dataset with 3,130 articles in the field of eHealth. The results show that the scores of indicators for ERNIE are higher than other models. BERT-TextCNN model is better in dealing with short text, while BERT-BiLSTM model is better in handling long sentences. The method proposed in this paper is helpful for the fine-grained functional understanding of scientific literature abstracts, and is of great significance to the in-depth mining of scientific literature and literature based knowledge discovery.

Key words: deep learning, BERT, literature structure, function identification, text classification

中图分类号: 

  • G353.1

引用本文

毛进, 陈子洋. 基于深度学习的科技文献摘要结构功能识别研究[J]. 农业图书情报学报, 2022, 34(3): 15-27.

MAO Jin, CHEN Ziyang. A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts[J]. Journal of Library and Information Science in Agriculture, 2022, 34(3): 15-27.