构建基于科技文献知识的人工智能引擎

doi:10.13998/j.cnki.issn1002-1248.20-0797

Abstract

Abstract: [Purpose/Significance] How to use the knowledge in the scientific and technological literature to train and improve the model of deep learning algorithm, and acquire knowledge and discover knowledge is an important subject of information research. In order to fully mine and utilize the value of literature knowledge, this paper proposes the goal of building an artificial intelligence (AI) engine based on scientific and technological literature knowledge. [Method/Process] It chooses the literature and information science work as the starting point and takes the scientific and technological literature as the most important carrier of human knowledge. This paper explores the essence of the rapid breakthrough of AI, and innovatively puts forward the construction idea of "science and technology knowledge engine" which is the transformation from "science and technology literature library" in the field of information science. [Results/Conclusions] This paper discusses the construction practice of AI engine based on scientific and technological literature knowledge and explores the method of using the deep learning technology to excavate knowledge to serve information research, so as to provide reference for peers.

Key words: scientific and technological literature, artificial intelligence (AI) knowledge engine, pre-trained language model, fine-tunning model, construction practice of AI engine

CLC Number:

G250

ZHANG Zhixiong, LIU Huan, YU Gaihong. Building an Artificial Intelligence Engine Based on Scientific and Technological Literature Knowledge[J].Journal of Library and Information Science in Agriculture, 2021, 33(1): 17-31.

References

[1] BELTAGY I, KYLE L, ARMAN C.SciBERT: A pretrained language model for scientific text[J]. arXiv preprint arXiv:1903.10676, 2019.
[2] WAN H, ZHANG Y, ZHANG J, et al.AMiner: Search and mining of academic social networks[J]. Data intelligence, 2019, 1(1): 58-76.
[3] SHOHAM Y, PERRAULT R, BRYNJOLFSSON E, et al.Artificial intelligence index 2017 annual report[J]. Artificial intelligence index, 2017.
[4] WU Y, SCHUSTER M, CHEN Z, et al. Google's neural machine translation system: Bridging the gap between human and machine translation[J]. ArXiv preprint arxiv:1609.08144, 2016.
[5] MCCULLOCH W S, PITTS W.A logical calculus of the ideas immanent in nervous activity[J]. The bulletin of mathematical biophysics, 1943, 5(4): 115-133.
[6] HOCHREITER S, SCHMIDHUBER J.Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
[7] LECUN Y, BOSER B, DENKER J S, et al.Backpropagation applied to handwritten zip code recognition[J]. Neural computation, 1989, 1(4): 541-551.
[8] GOODFELLOW I, BENGIO Y, COURVILLE A.Deep learning[M]. Cambridge: MIT press, 2016.
[9] 雷明. 机器学习与应用[M]. 北京: 清华大学出版社, 2019.
LEI M.Machine learning and application[M]. Beijing: Tsinghua university press, 2019.
[10] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. ArXiv preprint arxiv:1810.04805, 2018.
[11] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[J]. ArXiv preprint arxiv:1802.05365, 2018.
[12] RADFORD A, NARASIMHAN K, SALIMANS T, et al.Improving language understanding by generative pre-training[J]. URL https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf, 2018.
[13] RADFORD A, WU J, CHILD R, et al.Language models are unsupervised multitask learners[J]. OpenAI blog, 2019, 1(8).
[14] YANG Z, DAI Z, YANG Y, et al.XLNet: Generalized autoregressive pretraining for language understanding[J]. ArXiv preprint arxiv:1906.08237, 2019.
[15] LIU Y, OTT M, GOYAL N, et al.Roberta: A robustly optimized Bert pretraining approach[J]. ArXiv preprint arxiv:1907.11692, 2019.
[16] WANG A, SINGH A, MICHAEL J, et al. Glue: A multi-task benchmark and analysis platform for natural language understanding[J]. ArXiv preprint arxiv:1804.07461, 2018.
[17] SUN Y, WANG S, LI Y, et al.ERNIE: Enhanced representation through knowledge integration[J]. ArXiv preprint arxiv:1904.09223,2019.
[18] SUN Y, WANG S, LI Y, et al.ERNIE 2.0: a continual pre-training framework for language understanding[J]. ArXiv preprint arxiv:1907.12412, 2019.
[19] CUI Y, CHE W, LIU T, et al.Pre-Training with whole word masking for Chinese BERT[J]. ArXiv preprint arxiv:1906.08101, 2019.
[20] OpenCLaP. 多领域开源中文预训练语言模型仓库项目简介[EB/OL]. [2020-08-07]. http://zoo.thunlp.org/.
OpenCLaP. Open Chinese language pre-trained model zoo project brief[EB/OL]. [2020-08-07]. http://zoo.thunlp.org/.
[21] 谢玮, 沈一, 马永征. 基于图计算的论文审稿自动推荐系统[J]. 计算机应用研究, 2016, 33(3): 798-801.
XIE W, SHEN Y, MA Y Z.Recommendation system for paper reviewing based on graph computing[J]. Application research of computers, 2016, 33(3): 798-801.
[22] WANG D, LIANG Y, XU D, et al.A Content-Based recommender system for computer science publications[J]. Knowledge-Based systems, 2018: S0950705118302107.
[23] 于改红, 张智雄, 马娜. 科技文献语篇元素自动标注模型研究综述[J]. 图书情报工作, 2018, 62(15): 132-144.
YU G H, ZHANG Z X, MA N.Overview of science and technology literature discourse elements automatic annotation model research[J]. Library and information service, 2018, 62(15): 132-144.
[24] HéLèNE D R, FALQUET G. An automated annotation process for the SciDocAnnot scientific document model[C]//5th international workshop on semantic digital archives, 2015.
[25] LIAKATA M, TEUFEL S, SIDDHARTHAN A, et al.Corpora for the conceptualisation and zoning of scientific papers[C]//Proceedings of the international conference on language resources and evaluation, LREC 2010, 17-23 may 2010, Valletta, Malta: DBLP, 2010.
[26] TEUFEL S, SIDDHARTHAN A, BATCHELOR C.Towards discipline-independent argumentative zoning[C]//The 2009 conference. association for computational linguistics, 2009.
[27] FISAS B, RONZANO F, SAGGION H.A multi-layered annotated corpus of scientific papers[C]//LREC 2016, 2016.
[28] 张智雄, 刘欢, 丁良萍, 等. 不同深度学习模型的科技论文摘要语步识别效果对比研究[J]. 数据分析与知识发现, 2019, 3(12):1-9.
ZHANG Z X, LIU H, DING L P, et al.Identifying moves of research abstracts with deep learning methods[J]. Data analysis and knowledge discovery, 2019, 3(12): 1-9.
[29] YU G, ZHANG Z, LIU H, et al.Masked sentence model based on BERT for move recognition in medical scientific abstracts[J], 2019.
[30] DERNONCOURT F, LEE J Y. Pubmed 200k rct: a dataset for sequential sentence classification in medical abstracts[J]. ArXiv preprint arxiv:1710.06071, 2017.
[31] JIN D, PETER S. Hierarchical neural networks for sequential sentence classification in medical scientific abstracts[J]. arXiv preprint arXiv:1808.06161, 2018.
[32] 赵旸, 张智雄, 刘欢, 等. 基于BERT模型的中文医学文献分类研究[J/OL]. 数据分析与知识发现: 1-12[2020-08-05]. http://kns.cnki.net/kcms/detail/10.1478.G2.20200603.1457.004.html.
ZHAO Y, ZHANG Z X, LIU H, et al. Classification of Chinese medical literature with BERT model[J/OL]. Data analysis and knowledge discovery: 1-12[2020-08-05]. http://kns.cnki.net/kcms/detail/10.1478.G2.20200603.1457.004.html.
[33] DING L P, ZHANG Z X, LIU H, et al.Automatic keyphrase extraction from scientific Chinese medical abstracts based on character-level sequence labeling. In Wuhan’20: Joint conference on digital library, Wuhan, 2020.
[34] GERARD S, YANG C S, CLEMENT T Y.A theory of term importance in automatic text analysis[J]. Journal of the American society for Info rmation Science, 1975, 26(1): 33-44.
[35] Huang Z H, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint arXiv:1508.01991, 2015.
[36] NAVIGLI, ROBERTO, PAOLA V.Learning word-class lattices for definition and hypernym extraction[C]. Proceedings of the 48th annual meeting of the association for computational linguistics, 2010.
[37] DONG L, YANG N, WANG W, et al.Unified language model pre-training for natural language understanding and generation[C]//Advances in neural information processing systems, 2019: 13063-13075.

Related Articles 15

[1]	CHEN Wen, WANG Dongliang, XU Yunhao, CHEN Yuping, YANG Youqing. The Construction of Metadata Model for Digital Resources of Cultural Creativity Works [J]. Journal of Library and Information Science in Agriculture, 0, (): 1-1.
[2]	ZHOU Wenjie, YANG Kehu. An Overview of the Evidence-based Digital Humanity Paradigm with Chinese Characteristics from the Perspective of the Construction of Independent Knowledge System [J]. Journal of Library and Information Science in Agriculture, 2022, 34(11): 5-13.
[3]	WeiZhipeng, Zhao Yueyan, Yang Kehu, Zhou Wenjie. Evidence-based Digital Humanity Paradigm of First-hand Evidence: A Co Word Analysis Based on Bao's Zhan Guo Ce [J]. Journal of Library and Information Science in Agriculture, 2022, 34(11): 14-25.
[4]	WEN Yufeng, MA Qianni, YANG Kehu, ZHOU Wenjie. Exploring the Paradigm of Secondhand Evidence Based Digital Humanity Research: A Case Study of Dunhuang Wooden and Bamboo Slips of Han Dynasty [J]. Journal of Library and Information Science in Agriculture, 2022, 34(11): 26-37.
[5]	SHANG Hongli, ZHANG Sijie, WEI Zhipeng, YANG Kehu, ZHOU Wenjie. Evidence Integration Framework of Evidence-based Digital Humanities [J]. Journal of Library and Information Science in Agriculture, 2022, 34(11): 38-47.
[6]	WANG Fei, QIU Fengjie, WANG Hao. Library Green Transformation from the Prospect of Public Libraries' High-Quality Development [J]. Journal of Library and Information Science in Agriculture, 2022, 34(11): 57-68.
[7]	XIAO Keyi, QIN Jiajia, LI Yunfan. Practice and Enlightenment of Japanese University Libraries in Using Institutional Repositories for Research Data Management [J]. Journal of Library and Information Science in Agriculture, 2022, 34(11): 100-109.
[8]	ZHU Yiping, ZHU Yi, ZHANG Cheng. Influencing Factors of User Participation Behavior in Online Health Community under the Dimension of Emotional Experience [J]. Journal of Library and Information Science in Agriculture, 2022, 34(10): 15-18.
[9]	XIONG Huan, LUO Aijing, XIE Wenzhao, HUANG Panhao. Status and Influencing Factors of Health Information Literacy of the Rural Elderly [J]. Journal of Library and Information Science in Agriculture, 2022, 34(10): 44-56.
[10]	FAN Zhixuan, WANG Jian, SA Xu, ZHANG Guilan. Structure-Utility of Descriptive Information of Agricultural Scientific Data from the Perspective of Users [J]. Journal of Library and Information Science in Agriculture, 2022, 34(10): 57-69.
[11]	XIN Xiuqin. Analysis on the Key Contents of New Media Ecological Innovation and Development in Public Libraries [J]. Journal of Library and Information Science in Agriculture, 2022, 34(10): 91-100.
[12]	CHEN Xuefei, HUANG Jinxia, WANG Fang. Open Science: Connotation of Open Innovation and Its Mechanism for Innovation Ecology [J]. Journal of Library and Information Science in Agriculture, 2022, 34(9): 5-14.
[13]	WANG Lili, ZHANG Ning. Knowledge Correlation of Chinese Ancient Books from the Perspective of Digital Humanities [J]. Journal of Library and Information Science in Agriculture, 2022, 34(9): 51-59.
[14]	XIAO Man, WANG Xuan, WANG Fang, HUANG Jinxia. Framework and Development Path of Open Science Capability [J]. Journal of Library and Information Science in Agriculture, 2022, 34(9): 15-28.
[15]	LIU Jingyu, JIA Yujie, HUANG Jinxia, WANG Fang. Ethics Principle Framework of Data Handling for Open Scientific Innovation Ecology [J]. Journal of Library and Information Science in Agriculture, 2022, 34(9): 29-43.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Building an Artificial Intelligence Engine Based on Scientific and Technological Literature Knowledge

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0