深度学习语言模型的研究综述

doi:10.13998/j.cnki.issn1002-1248.23-0251

Abstract

Abstract: [Purpose/Significance] Deep learning for language modeling is one of the major methods and advanced technologies to enhance language intelligence of machines at present, which has become an indispensable important technical means for automatic processing and analysis of data resources, and intelligent mining of information and knowledge. However, there are still some difficulties in using deep learning for language modeling for technology development and application service in the library and information science (LIS) field. Therefore, this study systematically reviews and reveals the research progress, technical principles, and development methods of deep learning for language modeling, with the aim at providing reliable theoretical basis and feasible methodological paths for the deep understanding and application of deep learning for language modeling for librarians and fellow practitioners. [Method/Process] The data used in this study were collected from the WOS core database, CNKI literature database, arXiv preprint repository, GitHub open-source software hosting platform and the open resources on the Internet. Based on these data, this paper first systematically investigates the background, basic feature representation algorithms, and representative application development tools of deep learning for language modeling, reveals their dynamic evolution and technical principles, and analyzes the advantages and disadvantages and applicability of each algorithm model and development tool. Second, an in-depth analysis of the possible challenging problems faced by the development and application of deep learning for language modeling was performed, and two strategic approaches to expand their application capabilities were put forward. [Results/Conclusions] The important challenges faced by the application and development of deep learning for language modeling include numerous parameters and difficulties to adjust accuracy, relying on a large amount of accurate training data, difficulties in making changes, and the intellectual property and information security issues. In the future, we will start from two aspects of specific domains and feature engineering to expand and improve the application capabilities of deep learning for language modeling. Specifically, we focus on consideration of the collection and preparation of domain data, selection of model architecture, participation of domain experts, and optimization for specific tasks, in order to ensure that the data source of the model is more reliable and secure, and the application effect is more accurate and practical. Moreover, the strategic methods for feature engineering to expand the application capabilities of deep learning for language modeling include selecting appropriate features, feature pre-processing, feature selection, and feature dimensionality reduction. These strategies can help improve the performance and efficiency of deep learning for language models, making them more suitable for specific tasks or domains. To sum up, LIS institutions should leverage the deep learning for language modeling related technologies, guided by the needs of scientific research and social development, and based on advantages of existing literature data resources and knowledge services; they should carry out innovative professional or vertical domain intelligent knowledge management and application service, and develop technology and systems with independent intellectual property rights, which is their long-term sustainable development path.

Key words: deep learning, language model, neural network, pre-trained model, word embedding

CLC Number:

G202

WANG Sili, ZHANG Ling, YANG Heng, LIU Wei. Review of Deep Learning for Language Modeling[J].Journal of library and information science in agriculture, 2023, 35(8): 4-18.

References

[1] ZHAO W X, ZHOU K, LI J Y, et al. A survey of large language models[J]. arXiv Preprint, arXiv:2303.18223, 2023.
[2] QIU X P, SUN T X, XU Y G, et al.Pre-trained models for natural lan-guage processing: A survey[J]. Science China technological sciences, 2020, 63(10): 1872-1897.
[3] 毛进, 陈子洋. 基于深度学习的科技文献摘要结构功能识别研究[J]. 农业图书情报学报, 2022, 34(3): 15-27.
MAO J, CHEN Z Y.A Deep learning based approach to structural function recognition of scientific literature abstracts[J]. Journal of library and information science in agriculture, 2022, 34(3): 15-27.
[4] 康明. 深度学习预训练语言模型-案例篇: 中文金融文本情绪分类研究[M]. 北京: 清华大学出版社, 2022.
KANG M.Deep learning pre-training language model-case: A study on emotion classification of Chinese financial texts[M]. Beijing: Tsinghua University Press, 2022.
[5] HINTON G E, OSINDERO S, TEH Y W.A fast learning algorithm for deep belief nets[J]. Neural computation, 2006, 18(7): 1527-1554.
[6] YUSUKE S.Java deep learning essentials[M]. Beijing: China Ma-chine Press, 2017: 97-113.
[7] IENCO D, GAETANO R, INTERDONATO R, et al.Combining sen-tinel-1 and sentinel-2 time series via RNN for object-based land cov-er classification[C]// IGARSS 2019-2019 IEEE International Geo-science and Remote Sensing Symposium. Piscataway, New Jersey: IEEE, 2019: 4881-4884.
[8] JI S H, VISHWANATHAN S V N, SATISH N, et al. BlackOut: Speeding up recurrent neural network language models with very large vocabularies[J]. arXiv Preprint, arXiv:1511.06909, 2015.
[9] RNNLM Toolkit[EB/OL].[2023-02-20].https://github.com/IntelLabs/rnnlm.
[10] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[J]. arXiv Preprint, arXiv:1409.3215, 2014.
[11] CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv Preprint, arXiv:1406.1078, 2014.
[12] KIM Y. Convolutional neural networks for sentence classification[J]. arXiv Preprint, arXiv:1408.5882, 2014.
[13] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[J]. arXiv Preprint, arXiv:1607.01759, 2016.
[14] LIU P F, QIU X P, HUANG X J. Recurrent neural network for text classification with multi-task learning[J]. arXiv Preprint, arXiv:1605.05101, 2016.
[15] LAI S W, XU L H, LIU K, et al.Recurrent convolutional neural networks for text classification[C]// Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. New York: ACM, 2015: 2267-2273.
[16] JOHNSON R, ZHANG T.Deep pyramid convolutional neural networks for text categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2017: 562-570.
[17] PAPPAS N, POPESCU-BELIS A. Multilingual hierarchical attention networks for document classification[J]. arXiv Preprint, arXiv:1707.00896, 2017.
[18] KIM Y, LEE H, JUNG K. Attention-based convolutional neural networks for multi-label emotion classification[EB/OL].[2018-01-01]. http://sciencewise.info/articles/1804.00831/.
[19] TensorFlow[EB/OL].[2023-02-25].https://tensorflow.google.cn/.
[20] Deeplearning4j[EB/OL].[2023-02-25].https://github.com/deep-learning4j.
[21] PyTorch[EB/OL].[2023-02-25].https://pytorch.org/.
[22] Theano[EB/OL].[2023-02-25].https://pypi.org/project/Theano/.
[23] Keras[EB/OL].[2023-02-25].https://keras.io/.
[24] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv Preprint, arXiv:1301.3781, 2014.
[25] LE Q V, MIKOLOV T.Distributed representations of sentences and documents[C]//ICML'14 Proceedings of the 31st International Conference on International Conference on Machine Learning. Beijing, China: ICML, 2014(32): 1188-1196.
[26] JEFFREY P, RICHARD S, CHRISTOPHER D M.GloVe: Global vectors for word representation[EB/OL].[2018-12-29].https://nlp.stanford.edu/projects/glove/.
[27] NIU L Q, DAI X Y, ZHANG J B, et al.Topic2Vec: Learning dis-tributed representations of topics[C]// 2015 International Conference on Asian Language Processing(IALP). Piscataway, New Jersey: IEEE, 2016: 193-196.
[28] MOODY C E. Mixing dirichlet topic models and word embeddings to make lda2vec[J]. arXiv Preprint, arXiv:1605.02019, 2016.
[29] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for im-age recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway, New Jersey: IEEE, 2016: 770-778.
[30] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[J]. arXiv Preprint, arXiv:1706.03762, 2017.
[31] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[J]. arXiv Preprint, arXiv:1802.05365, 2018.
[32] REDDY R. Universal language model fine-tuning for text classification[J]. arXiv Preprint, arXiv:1801.06146, 2018.
[33] GPT-2[EB/OL].[2023-02-28].https://github.com/openai/gpt-2.
[34] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv Preprint, arXiv:1810.04805, 2019.
[35] DAI Z H, YANG Z L, YANG Y M, et al.Transformer-XL: Attentive language models beyond a fixed-length context[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019.
[36] YANG Z L, DAI Z H, YANG Y M, et al.XLNet: Generalized autore-gressive pretraining for language understanding[J]. arXiv Preprint, arXiv: 1906.08237, 2019.
[37] ZHONG H X, ZHANG Z Y, LIU Z Y, et al.Open Chinese language pre-trained model zoo[EB/OL].[2020-03-18].https://github.com/thunlp/OpenCLaP.
[38] CUI Y M, CHE W X, LIU T, et al.Pre-training with whole word masking for Chinese BERT[EB/OL].[2023-03-09].https://github.com/ymcui/Chinese-BERT-wwm.
[39] XU L.RoBERTa for Chinese[EB/OL].[2022-06-15].https://github.com/brightmart/roberta_zh.
[40] ALAN A, DUNCAN B, ROLAND V.Contextual string embeddings for sequence labeling[EB/OL].[2023-03-10].https://github.com/zalandoresearch/flair.
[41] Stanford NLP[EB/OL].[2023-03-10].https://github.com/stanfordnlp.
[42] ChatGPT: Optimizing language models for dialogue[EB/OL].[2023-03-16].https://openai.com/blog/chatgpt.
[43] NISAN S, LONG O, JEFFREY W, et al.Learning to summarize with human feedback[C]//Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020: 3008-3021.
[44] LEO G, JOHN S, JACOB H. Scaling laws for reward model overoptimization[J]. arXiv Preprint, arXiv:2210.10760, 2022.
[45] GPT-4[EB/OL].[2023-03-16].https://openai.com/product/gpt-4.
[46] 刘高畅, 杨然. ChatGPT需要多少算力[R/OL]. 北京: 国盛证券, 2023.
LIU G C, YANG R.How much computing power does ChatGPT require[R/OL]. Beijing: Guosen Securities, 2023.
[47] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al.Dropout: A simple way to prevent neural networks from overfitting[J]. Journal of machine learning research, 2014, 15: 1929-1958.
[48] Al text classifier[EB/OL].[2023-03-16].https://platform.openai.com/ai-text-classifier.
[49] AIGC-X[EB/OL].[2023-03-16]. http://ai.sklccc.com.
[50] VAN DIS E A M, BOLLEN J, ZUIDEMA W, et al. ChatGPT: Five priorities for research[J]. Nature, 2023, 614(7947): 224-226.
[51] Prompt engineer and librarian[EB/OL].[2023-03-31].https://jobs.lever.co/Anthropic/e3cde481-d446-460f-b576-93cab67bd1ed.
[52] 张智雄, 钱力, 谢靖, 等. ChatGPT对科学研究和文献情报工作的影响[R/OL]. 北京: 国家科技图书文献中心 & 中国科学院文献情报中心, 2023.
ZHANG Z X, QIAN L, XIE J, et al.The Impact of ChatGPT on scientific research and documentation and information work[R/OL]. Beijing: National Science and Technology Library & National Science Library of Chinese Academy of Sciences, 2023.
[53] 张晓林. 从猿到人:探索知识服务的凤凰涅槃之路[J]. 数据分析与知识发现, 2023, 7(3): 1-4.
ZHANG X L.From ape to man: Exploring the phoenix nirvana road of knowledge service[J]. Data analysis and knowledge discovery, 2023, 7(3): 1-4.
[54] 曹树金, 曹茹烨. 从ChatGPT看生成式AI对情报学研究与实践的影响[J]. 现代情报, 2023, 43(4): 3-10.
CAO S J, CAO R Y.Influence of generative AI on the research and practice of information science from the perspective of ChatGPT[J]. Journal of modern information, 2023, 43(4): 3-10.

Related Articles 15

[1]	LYU Lucheng, ZHOU Jian, SUN Wenjun, ZHAO Yajuan, HAN Tao. Performance of Fine-Tuned Large Language Models in Patent Text Mining [J]. Journal of library and information science in agriculture, 2026, 38(4): 36-46.
[2]	QIAN Li, YANG Yanxi, ZHANG Yuanzhe, HU Maodi, CHANG Zhijun. The Impacts and Implications of OpenClaw for Scientific and Technical Literature Intelligence Work [J]. Journal of library and information science in agriculture, 2026, 38(4): 4-12.
[3]	DENG Qiping, KE Jiaxiu, GAN Peng, ZHOU Song. Construction of an Intelligent Agent for Academic Output Data Analysis Oriented to Academic Evaluation [J]. Journal of library and information science in agriculture, 2026, 38(3): 76-87.
[4]	YI Chenhe, ZHANG Yuting. Risk Assessment and Early Warning of Generative Artificial Intelligence Impact on Network Public Opinion Based on Optimized BP Neural Network [J]. Journal of library and information science in agriculture, 2026, 38(2): 30-41.
[5]	WU Yuhao, LIU Yihao, LI Qingjun, HU Xu. Open Sharing of Library Data Based on Large Language Models: Logic, Path and Strategy [J]. Journal of library and information science in agriculture, 2026, 38(1): 28-43.
[6]	WANG Xiaoyu, HU Jingyuan, WU Ruoyu, WANG Shu, ZHAI Yujia. An LLM-based Data Augmentation Method for Constructing Science & Technology Topic Linkages: Taking the Energy Conservation Field as an Example [J]. Journal of library and information science in agriculture, 2025, 37(9): 63-81.
[7]	CHANG Hao, XU Taotao, LI Feng. A Multi-dimensional Feature Text Complexity Framework and Knowledge Base Augmentation Model [J]. Journal of library and information science in agriculture, 2025, 37(8): 61-77.
[8]	LIU Wei, ZHANG Lei, JI Ting, CHEN Xiaoyang. Shaping the Smart Libraries with AI: An Agent-based, Next-Generation Library Service Platform [J]. Journal of library and information science in agriculture, 2025, 37(5): 15-26.
[9]	QIAN Li, WANG Qianying, LIU Yi, ZHANG Yuanzhe, CHANG Zhijun. Agent Technology and Its Applications in Scientific Research [J]. Journal of library and information science in agriculture, 2025, 37(5): 5-14.
[10]	ZHANG Li, WANG Bo, JING Shui. Generative AI-Driven Resource Discovery in Public Libraries: Service Optimization Based on a Dynamic Evaluation Model [J]. Journal of library and information science in agriculture, 2025, 37(5): 58-71.
[11]	SANG Yuanyuan. Multimodal Learning Technology Aimed at Exploring the Innovative Path of Library Intelligence Service [J]. Journal of library and information science in agriculture, 2025, 37(3): 42-52.
[12]	CAI Yiran, HU Zhengyin, LIU Chunjiang. Analysis of Progress in Data Mining of Scientific Literature Using Large Language Models [J]. Journal of library and information science in agriculture, 2025, 37(2): 4-22.
[13]	AN Bo. A Multi-Task Knowledge Extraction Method for Traditional Chinese Medicine Ancient Books Integrating Chain-of-Thought [J]. Journal of library and information science in agriculture, 2025, 37(12): 81-94.
[14]	LI Xinxin, MA Yumeng, JU Zihan, WANG Jing. Aspect-Level Sentiment Analysis of Science and Technology Policy Reviews Based on Large Language Models: A Case Study of the New Energy Vehicle Industry [J]. Journal of library and information science in agriculture, 2025, 37(10): 53-66.
[15]	Haoxian WANG, Ziming ZHOU, Feifei DING, Chengfu WEI. Digital Humanities & Large Language Models: Practice and Research in Semantic Retrieval of Ancient Documents [J]. Journal of library and information science in agriculture, 2024, 36(9): 89-101.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Review of Deep Learning for Language Modeling

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0