基于深度学习的科技文献摘要结构功能识别研究

doi:10.13998/j.cnki.issn1002-1248.21-0707

Abstract

Abstract: [Purpose/Significance] Abstracts of scientific documents are often composed of sections with specific functions. Using the deep learning method to identify structural functions of abstracts of scientific documents is conducive to the in-depth analysis of the documents. [Method/Process] In this paper, identifying structural functions of abstracts of scientific documents is transformed into a text classification problem, and its structure functions are divided into four categories: "introduction, methods, results, conclusions (IMRC)". Based on the text content and context features of abstract sentences, the classifier is constructed based on deep learning models such as BERT, BERT-BiLSTM, BERT-TextCNN and ERNIE, to automatically identify structural functions of abstracts of scientific documents. [Results/Conclusions] Experiments are carried out on a dataset with 3,130 articles in the field of eHealth. The results show that the scores of indicators for ERNIE are higher than other models. BERT-TextCNN model is better in dealing with short text, while BERT-BiLSTM model is better in handling long sentences. The method proposed in this paper is helpful for the fine-grained functional understanding of scientific literature abstracts, and is of great significance to the in-depth mining of scientific literature and literature based knowledge discovery.

Key words: deep learning, BERT, literature structure, function identification, text classification

CLC Number:

G353.1

MAO Jin, CHEN Ziyang. A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts[J].Journal of Library and Information Science in Agriculture, 2022, 34(3): 15-27.

References

[1] 沈思, 胡昊天, 叶文. 基于全字语义的摘要结构功能自动识别研究[J]. 情报学报, 2019, 38(1): 79-88.
SHEN S, HU H T, YE W.Research on abstract structure function automatic recognition based on full character semantics[J]. Journal of the China society for scientific and technical information, 2019, 38(1): 79-88
[2] 曹雁, 牟爱鹏. 科技期刊英文摘要学术词汇的语步特点研究[J]. 外语学刊, 2011(3): 46-49.
CAO Y, MU A P.The characteristics of academic words across different abstract moves of English scientific and technical journals[J]. Foreign language research, 2011(3): 46-49.
[3] GRATEZ N.Teaching EFL students to extract structural information from abstracts[M]. Belgium: ACCO, 1985 :123-135.
[4] SWALES J M.Genre analysis: English in academic and research settings[D]. Cambridge: Cambridge university press, 1990.
[5] TSENG F.Analyses of move structure and verb tense of research article abstracts in applied linguistics[J]. International journal of English linguistics, 2011, 1(2): 27-39.
[6] 李涛. 科技论文的英文摘要规范化问题研究——以自然科学论文为例[J]. 辽宁工业大学学报(社会科学版), 2018, 20(6): 70-73.
LI T.Research on the standardization of English abstracts of scientific and technological papers - Taking natural science papers as an example[J]. Journal of Liaoning institute of technology (social science edition), 2018, 20(6): 70-73.
[7] 周志超. 中文图情期刊摘要的核心要素与逻辑结构分析[J]. 情报科学, 2018, 36(3): 8-12, 32.
ZHOU Z C.The analysis on core elements and logical structure of abstracts of Chinese journals in library and information science domain[J]. Information science, 2018, 36(3): 8-12, 32.
[8] 宋建武, 朱静, 黄开颜, 等. 高影响因子国际医学期刊摘要类型的分析与思考[J]. 中国科技期刊研究, 2010, 21(2): 181-184.
SONG J W, ZHU J, HUANG K Y, et al.Structured or unstructured abstracts? - A comparative analysis of international medical journals with high impact factors and Chinese medical journals[J]. Chinese journal of scientific and technical periodicals, 2010, 21(2): 181-184.
[9] HARTLEY J.Current findings from research on structured abstracts[J]. Med libr assoc, 2014, 92(3): 368-371.
[10] 宋东桓, 李晨英, 刘子瑜, 等. 英文科技论文摘要的语义特征词典构建[J]. 图书情报工作, 2020, 64(6): 108-119.
SONG D H, LI C Y, LIU Z Y, et al.Semantic feature dictionary construction of abstract in English scientific journals[J]. Library and information service, 2020, 64(6): 108-119.
[11] ANTHONY L.A machine learning system for the automatic identification of text structure and application to research article abstracts in computer science[D]. Birmingham: Birmingham university, 2002.
[12] TUAROB S, MITRA P, GILES C L.A hybrid approach to discover semantic hierarchical sections in scholarly documents[C]. New York, USA: International conference on document analysis & recognition, Tunis, Tunisia, IEEE, 2015.
[13] KIM S, MARTINE Z, CAVEDON L.Automatic classification of sentences to support evidence based medicine[J]. BMC bioinformalics, 2011, 12(2): 1-10.
[14] 王东波, 陆昊翔, 周鑫. 面向摘要结构功能划分的模型性能比较研究[J]. 图书情报工作, 2018, 62(12): 84-90.
WANG D B, LU H X, ZHOU X.A comparative study of model performances facing abstract structure function[J]. Library and information service, 2018, 62(12): 84-90.
[15] 王东波, 高瑞卿, 叶文豪. 不同特征下的学术文本结构功能自动识别研究[J]. 情报学报, 2018, 37(10): 997-1008.
WANG D B, GAO R Q, YE W H.Research on the structure recognition of academic texts under different characteristics[J]. Journal of the China society for scientific andtechnical information, 2018, 37(10): 997-1008.
[16] 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别功能框架及基于章节标题的识别[J]. 情报学报, 2014, 33(9): 979-985.
LU W, HUANG Y, CHEN Q K.The structure function of academic text and its classification[J]. Journal of the China society for scientific andtechnical information, 2014, 33(9): 979-985.
[17] 黄永, 陆伟, 程齐凯, 等. 学术文本的结构功能识别——基于段落的识别[J]. 情报学报, 2016, 35(5): 530-538.
HUANG Y, LU W, CHENG Q K, et al.The structure function recognition of academic text - Paragraph-based recognition[J]. Journal of the China society for scientific andtechnical information, 2016, 35(5): 530-538.
[18] 张智雄, 刘欢, 于改红. 构建基于科技文献知识的人工智能引擎[J]. 农业图书情报学报, 2021, 33(1): 17-31.
ZHANG Z X, LIU H, YU G H.Building an artificial intelligence engine based on scientific and technological literature knowledge[J]. Journal of library and information science in agriculture, 2021, 33(1): 17-31.
[19] 陆伟, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于BERT向量化表示的关键词自动分类研究[J]. 情报学报, 2020, 39(12): 1320-1329.
LU W, LI P C, ZHANG G B, et al.Recognition of lexical functions in academic texts: Automatic classification of keywords based on BERT vectorization[J]. Journal of the China society for scientific and technical information, 2020, 39(12): 1320-1329.
[20] ALMUGBEL Z, ELHAGGAR N, BUGSHAN N.Automatic structured abstract for research papers supported by tabular format using NLP[J]. International journal of advanced computer science and applications (IJACSA), 2019, 10(2): 233-240.
[21] 赵丹宁, 牟冬梅, 白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
ZHAO D N, MU D M, BAI S.Automatically extracting structural elements of sci-tech literature abstracts based on deep learning[J]. Data analysis and knowledge discovery, 2021, 5(7): 70-80.
[22] 刘忠宝, 王宇飞, 张志剑. 基于深度学习模型的摘要结构功能识别方法研究[J]. 情报科学, 2021, 39(3): 107-112.
LIU Z B, WANG Y F, ZHANG Z J.Research on the recognition method of abstract structure function based on deep learning model[J]. Information science, 2021, 39(3): 107-112.
[23] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[J/OL]. [2017-12-06].https://arxiv.org/abs/1706.03762.
[24] LOGESWARAN L, LEE H.An efficient framework for learning sentence representations[J/OL]. [2018-05-07].https://arxiv.org/abs/1803.02893.
[25] RADFORD A, NARASIMHAN K.Improving language under-standing by generative pre-training[J/OL]. [2021-08-24].https://s3-us-west2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
[26] YOON K.Convolutional neural networks for sentence classification[J/OL]. [2014-09-03].https://arxiv.org/abs/1408.5882.
[27] 马晨峰. 混合深度学习模型在新闻文本分类中的应用[D]. 济南: 山东大学, 2018.
MA C F.Hybrid deep learning model for news classification[D]. Jinan: Shandong university, 2018.
[28] ZHANG Y, WALLACEB. A sensitivity analysis of (and practitioners' guide to) convolutionalneural networks for sentence classification[J/OL].[2016-04-06]. https://arxiv.org/abs/1510.03820.
[29] SUN Y, WANG S, LI Y, et al.ERNIE: Enhanced representation through knowledge integration[J/OL].[2019-04-19].https://arxiv.org/abs/1904.09223v1.
[30] 黄河清, 韩健, 张鲸惊, 等. 中外科技期刊英文摘要文体格式的变化及建议[J]. 中国科技期刊研究, 2015, 26(2): 143-151.
HUANG H Q, HAN J, ZHANG J J, et al.Format and style of English abstract of scientific papers: Trend and recommendations[J]. Chinese journal of scientific and technical periodicals, 2015, 26(2): 143-151.
[31] GOTMARE A, KESKAR N S, XIONG C, et al.A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation[J/OL]. [2018-10-29].https://arxiv.org/abs/1810.13243.
[32] YOU Y, GITMAN I, GINSBURG B.Large batch training of convolitional networks[J/OL]. [2017-09-13].https://arxiv.org/abs/1708.03888v3.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 6

Metrics

Comments

Recommended 0

[1]	SHI Yunlai, CUI Yunpeng, DU Zhigang. A Classification Method of Agricultural News Text Based on BERT and Deep Active Learning [J]. Journal of Library and Information Science in Agriculture, 2022, 34(8): 19-29.
[2]	HOU Xiangying, CUI Yunpeng, LIU Juan. Applications and Prospect Analysis of Deep Learning in Plant Genomics and Crop Breeding [J]. Journal of Library and Information Science in Agriculture, 2022, 34(8): 4-18.
[3]	LYU Lucheng, HAN Tao. Artificial Intelligence Empowers Library and Information Service ——Review of Forums about Information Technology for Library 2019 [J]. Journal of Library and Information Science in Agriculture, 2020, 32(5): 13-18.
[4]	WANG Xuejing. Research on Intelligent Service Mode of Digital Library Based on Deep Learning Technology [J]. , 2018, 30(9): 150-153.
[5]	LUO Xin. Comparative Study of Chinese Text Classification Model based on Particle Swarm Intelligence [J]. , 2018, 30(4): 18-22.
[6]	LUO Xin. Research on text Classification Model Based on Random Forests [J]. , 2016, 28(11): 50-53.