中文    English

Journal of Library and Information Science in Agriculture ›› 2023, Vol. 35 ›› Issue (3): 15-24.doi: 10.13998/j.cnki.issn1002-1248.23-0059

Previous Articles     Next Articles

Literature Classification Methods based on Structural Information Enhancement

AN Bo1,2   

  1. 1. Institute of Ethnology and Anthropology, Chinese Academy of Social Sciences, Beijing 100081;
    2. Institute of Software, Chinese Academy of Sciences, Beijing 100190
  • Received:2023-02-08 Online:2023-03-05 Published:2023-05-31

Abstract: [Purpose/Significance] Literature classification is a fundamental task in library and information service, which is of great value for information resource management, and literature retrieval and acquisition. Deep learning-based literature classification methods are the current mainstream methods in text classification, which employ neural networks to model and use the textual content for literature classification. This approach only utilizes the information of the literature itself, but ignores the knowledge of the association between the literature. By observing the data, we found that literature in the same category tends to share more keyword information. The literature can build association networks through keywords to form structural relationships between literature. We attempt to utilize this structural in-formation to improve the performance of literature classification. [Methods/Process] This paper proposes a method that can model the structural representation of the literature and employ this representation to enhance traditional literature classification methods. Specifi-cally, we first constructed a large-scale keyword dictionary based on the collected data from about 930,000 documents. Second, we extracted the keyword set from the titles and abstracts of papers by a two-way maximum matching algorithm and constructed the keyword-literature graph data with the literature and keywords as nodes and the inclusion relationship between the documents and keywords as edges. The literature was connected with each other by keywords. Furthermore, we employed graph convolutional neural network to model the literature graph and learn the representation of literature and keywords in the keyword-literature graph. The literature representation generated by graph neural network contained the structural relationships between the literature. In addition, we employed Bert+BiLSTM to model the textual content representation of literature. Finally, the structural and textual representations of the literature were concatenated, and the classification of the literature was performed based on this representation. [Results/Conclusions] We constructed a literature classification dataset containing 423 classes and divided the training set, validation set and test set according to the ratio of 8:1:1. We conducted literature classification experiments on this dataset. The experimental results show that the structural information of literature can effectively enhance the performance of traditional literature classification methods. The results of the stripping experiments also show that the structural information alone is insufficient for the literature classification task. Through detailed analysis of the error data, we found that the model still has problems in handling some less frequent keywords and concepts. In the future, we plan to use small-sample learning methods to solve the classification problem for literature categories with less data.

Key words: literature classification, graph convolution network, keyword-literature graph, semantic association, knowledge organization, natural language processing

CLC Number: 

  • TP393
[1] 张智雄, 赵旸, 刘欢. 构建面向实际应用的科技文献自动分类引擎[J]. 中国图书馆学报, 2022, 48(4): 104-115.
ZHANG Z X, ZHAO Y, LIU H.Construction of a practical application-oriented automatic classification engine for scientific literature[J]. Journal of library science in China, 2022, 48(4): 104-115.
[2] 李清, 侯荣理, 张馨. 《中国图书馆分类法》类目注释问题探讨[J]. 数字图书馆论坛, 2022(1): 47-51.
LI Q, HOU R L, ZHANG X.Discussion on some problems of class annotation in Chinese library classification[J]. Digital library forum, 2022(1): 47-51.
[3] 雷兵, 刘小, 钟镇. 基于题录信息的领域学术文献细粒度分类方法研究[J]. 图书情报工作, 2021, 65(14): 128-137.
LEI B, LIU X, ZHONG Z.Research on fine-grain classification method of academic literature based on bibliographies[J]. Library and information service, 2021, 65(14): 128-137.
[4] 谢红玲, 奉国和, 何伟林. 基于深度学习的科技文献语义分类研究[J]. 情报理论与实践, 2018, 41(11): 149-154.
XIE H L, FENG G H, HE W L.Research on semantic classification of scientific and technical literature based on deep learning[J].
5 Information studies: Theory & application, 2018, 41(11): 149-154.
[5] 陈德光, 马金林, 马自萍, 等. 自然语言处理预训练技术综述[J]. 计算机科学与探索, 2021, 15(8): 1359-1389.
CHEN D G, MA J L, MA Z P, et al.Review of pre-training tech-niques for natural language processing[J]. Journal of frontiers of computer science and technology, 2021, 15(8): 1359-1389.
[6] 沈立力, 姜鹏, 王静. 基于BERT模型的中文期刊文献自动分类实践研究[J]. 图书馆杂志, 2022, 41(5): 109-118, 135.
SHEN L L, JIANG P, WANG J.A study on the automatic classification of Chinese literature in periodicals based on BERT model[J]. Library journal, 2022, 41(5): 109-118, 135.
[7] 马帅, 刘建伟, 左信. 图神经网络综述[J]. 计算机研究与发展, 2022, 59(1): 47-80.
MA S, LIU J W, ZUO X.Survey on graph neural network[J]. Journal of computer research and development, 2022, 59(1): 47-80.
[8] 宁懿昕, 谢辉, 姜火文. 图神经网络社区发现研究综述[J]. 计算机科学, 2021, 48(s2): 11-16.
NING Y X, XIE H, JIANG H W.Survey of graph neural network in community detection[J]. Computer science, 2021, 48(s2): 11-16.
[9] 侯汉清, 黄刚. 电子计算机与文献分类[J]. 计算机与图书馆, 1982(1): 5-14.
HOU H Q, HUANG G.Computer and document classification[J]. Data analysis and knowledge discovery, 1982(1): 5-14.
[10] 叶新明, 徐进鸿. 中文文献自动分类研究[J]. 情报科学, 1992(5): 31-34.
YE X M, XU J H.Research on automatic classification of Chinese documents[J]. Information science, 1992(5): 31-34.
[11] 庞观松, 蒋盛益. 文本自动分类技术研究综述[J]. 情报理论与实践, 2012, 35(2): 123-128.
PANG S G, JIANGS Y.A survey of automatic text classification technology[J]. Information studies: Theory & application, 2012, 35(2): 123-128.
[12] 周丽红, 刘勘. 基于关联规则的科技文献分类研究[J]. 图书情报工作, 2012, 56(4): 12-16, 119.
ZHOU L H, LIU K.Research on classification of scientific and technological documents based on association rules[J]. Library and information service, 2012, 56(4): 12-16, 119.
[13] 王方, 阮梅花, 朱海刚, 等. 基于向量空间模型的科技文献自动分类研究[J]. 情报探索, 2013(12): 1-3, 8.
WANGF, RUAN M H, ZHU H G, et al. Research on vector space model-based automatic classification of sci-tech document[J]. Information research, 2013(12): 1-3, 8.
[14] 李彦轩. 基于摘要的论文分类与推荐模型的研究与实现[D]. 北京: 北京邮电大学, 2019.
LI Y X.Research and implementation of abstract-based paper classification and recommendation model[D]. Beijing: Beijing uni-versity of posts and telecommunications, 2019.
[15] 何浩, 杨海棠. 一种基于N-Gram技术的中文文献自动分类方法[J]. 情报学报, 2002(4): 421-427.
HE H, YANG H T.Approach of chinese document automatic classification based on the frequency of N-Gram[J]. Journal of the China society for scientific and technical information, 2002(4): 421-427.
[16] 王颖. 科技文献内容语义描述模型研究[J].农业图书情报学报,2020, 32(8): 12-24.
WANG Y.Semantic models for the content of scientific literature[J]. Journal of library and information science in agriculture, 2020, 32(8): 12-24.
[17] 赵旸, 张智雄, 刘欢. 基于层次分类法的中文医学文献分类研究[J]. 图书馆学研究, 2021(21): 49-55, 61.
ZHAO Y, ZHANG Z X, LIU H.Research on chinese medical literature classification based on hierarchical classification[J]. Research on library science, 2021(21): 49-55, 61.
[18] 张晓丹. 改进的图神经网络文本分类模型应用研究——以NSTL科技期刊文献分类为例[J]. 情报杂志, 2021, 40(1): 184-188.
ZHANG X D.The application of improved graph convolutional neural network in big data classification of scientific and technological documents[J]. Journal of intelligence, 2021, 40(1): 184-188.
[19] GORI M, MONFARDINI G, SCARSELLI F.A new model for learn-ing in graph domains[C]. Proceedings of the IEEE international joint conference on neural networks, IEEE, 2005: 729-734.
[20] BRUNA J, ZAREMBA W, SZLAM A, et al.Spectral networks and locally connected net-works on graphs[J/OL]. arXiv Preprint, arXiv: 1312.6203.
[21] 杨旭华, 金鑫, 陶进, 等. 基于图神经网络和依存句法分析的文本分类[J]. 计算机科学, 2022, 49(12): 293-300.
ZHANG X H, XIN J, TAO J, et al.Text classification based on graph neural networks and dependency parsing[J]. Computer science, 2022, 49(12): 293-300.
[22] 王婷, 朱小飞, 唐顾. 基于知识增强的图卷积神经网络的文本分类[J]. 浙江大学学报(工学版), 2022, 56(2): 322-328.
WANG T, ZHU X F, TANG G.Knowledge-enhanced graph convolutional neural networks for text classification[J]. Journal of Zhejiang university(engineering science), 2022, 56(2): 322-328.
[23] 胡春华, 邓奥, 童小芹, 等. 社交电商中融合信任和声誉的图神经网络推荐研究[J]. 中国管理科学, 2021, 29(10): 202-212.
HU C H, DENG A, TONG X Q, et al.A graph neural network recommendation study combing trust and reputation in social e-commerce[J]. Chinese journal of management science, 2021, 29(10): 202-212.
[24] 邵云飞, 宋友, 王宝会. 基于社交网络图节点度的神经网络个性化传播算法研究[J/OL]. 计算机科学: 1-10[2023-02-08]. http://kns.cnki.net/kcms/detail/50.1075.TP.20221228.1215.008.html.
SHAO Y F, SONG Y, WANG B H.Study on personalized propagation algorithm of neural network based on graph node degree of social network[J]. Computer science: 1-10[2023-02-08]. Study on personalized propagation algorithm of neural network based on graph node degree of social network[J]. Computer science: 1-10[2023-02-08]. http://kns.cnki.net/kcms/detail/50.1075.TP.20221228.1215.008.html.
[25] 顾希之, 邵蓥侠. 基于影响力剪枝的图神经网络快速计算图精简[J]. 计算机科学, 2023, 50(1): 52-58.
GU X Z, SHAO Y X.Fast computation graph simplification via influ-ence-based pruning for graph neural network[J]. Computer science, 2023, 50(1): 52-58.
[26] 苗旭鹏, 王驭捷, 沈佳, 等. 面向多GPU的图神经网络训练加速[J/OL]. 软件学报: 1-14[2023-02-08]. DOI:10.13328/j.cnki.jos.006647.
MIAO X P, WANG N J, SHEN J, et al.Graph neural network training acceleration for Multi-GPUs[J]. Journal of software: 1-14[2023-02-08]. DOI:10.13328/j.cnki.jos.006647.
[27] 丁恒, 任卫强, 曹高辉. 基于无监督图神经网络的学术文献表示学习研究[J]. 情报学报, 2022, 41(1): 62-72.
DING H, REN W Q, CAO G H.Using unsupervised graphs of neural networks for constructing learning representations of academic papers[J]. Journal of the China society for scientific and technical information, 2022, 41(1): 62-72.
[28] 黄学坚, 刘雨飏, 马廷淮. 基于改进型图神经网络的学术论文分类模型[J]. 数据分析与知识发现, 2022, 6(10): 93-102.
HUANG X J, LIU Y Y, MA T H.Classification model for scholarly articles based on improved graph neural network[J]. Data analysis and knowledge discovery, 2022, 6(10): 93-102.
[29] 蒋昂波, 王维维. ReLU激活函数优化研究[J]. 传感器与微系统, 2018, 37(2): 50-52.
JIANG A B, WANG W W.Research on optimization of ReLU activa-tion function[J]. Transducer and microsystem technologies, 2018, 37(2): 50-52.
[30] 黄光红, 林广栋, 吴尔杰, 等. 深度神经网络Softmax函数定点算法设计[J]. 中国集成电路, 2022, 31(7): 60-64.
HUANG H L, LIN G D, WU E J, et al.Design of fixed-point algorithm for softmax of DNN[J]. China integrated circuit, 2022, 31(7): 60-64.
[1] HUO Mengjia, LIU Juan, Huang Jie. Construction and Application of the Attention Analysis Model of Brand Management Policies of Agricultural Products with Geographical Indications [J]. Journal of Library and Information Science in Agriculture, 2023, 35(7): 94-104.
[2] ZHANG Zhixiong, ZENG Jianxun, XIA Cuijuan, WANG Dongbo, LI Baiyang, CAI Yingchun. Information Resource Management Researchers' Thinking about the Opportunities and Challenges of AIGC [J]. Journal of Library and Information Science in Agriculture, 2023, 35(1): 4-25.
[3] SUN Shaodan, DENG Jun, ZHANG Zishu, ZHONG Chuyi, SHENG Panpan. Topic Knowledge Organization of Modern Newspaper Resources by Incorporating the Knowledge Element Concept: Taking the "Shengjing Times" as an Example [J]. Journal of Library and Information Science in Agriculture, 2022, 34(4): 50-62.
[4] WANG Xin, LU Yao, YUAN Xue, ZHAO Wanjing, CHEN Li, LIU Minjuan. A Survey of Author Name Disambiguation Techniques of Academic Papers [J]. Journal of Library and Information Science in Agriculture, 2022, 34(10): 82-90.
[5] SUN Tan, DING Pei, HUANG Yongwen, XIAN Guojian. Review on the Application and Development Strategies of Text Mining in Agriculture Knowledge Services [J]. Journal of Library and Information Science in Agriculture, 2021, 33(1): 4-16.
[6] CHAI Miaoling, HUANG Lin, REN Yunyue. A Review of Construction of Major Agricultural Open Scientific Data Resources [J]. Journal of Library and Information Science in Agriculture, 2020, 32(10): 25-34.
[7] CHEN Qingyun, CAO Jianfei, CHEN Rongzhen. Research and Practices From the Thesaurus to Knowledge Graph [J]. Agricultural Library and Information, 2019, 31(1): 44-53.
[8] SUN Haixia, LI Junlian, HUA Weina, QIAN Qing. Design and Implementation of Network Collaborative Work Platform for Semantic Interoperability of Science and Technology Knowledge Organization Systems [J]. Agricultural Library and Information, 2019, 31(1): 23-34.
[9] CHEN Demin. Research on Information Discovery Service Model of Digital Library Based on Knowledge Organization [J]. , 2018, 30(4): 185-188.
[10] CHEN Liang. Research on Knowledge Discovery Service for Digital Library Collection Resources Based on Semantic Association [J]. , 2018, 30(3): 38-41.
[11] QIN Feifei, CAO Tao, QIAN Zhiyong. Academic Integrity Knowledge Organization Integrated with Online Teaching of Information Literacy [J]. , 2017, 29(12): 120-126.
[12] LIU Zhao-wei. Research of Personalized Literature Retrieval mode Based on User’s Demand [J]. , 2016, 28(6): 158-161.
[13] REN Wei. The Research of Knowledge Management Strategy Based on Crisis of Marginalization [J]. , 2015, 27(6): 5-9.
[14] Library, Xi’an University of Technology, Xi’an 710048, China. Practice and Exiting Problems of Chinese Book Purchase for University Libraries [J]. , 2014, 26(3): 92-95.
[15] YANG Xiao-ling. Construction of the Mode of Library Reference Service based on Knowledge Organization [J]. , 2014, 26(2): 182-184.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!