深度学习语言模型的研究综述

doi:10.13998/j.cnki.issn1002-1248.23-0251

Abstract

Abstract: [Purpose/Significance] Deep learning for language modeling is one of the major methods and advanced technologies to enhance language intelligence of machines at present, which has become an indispensable important technical means for automatic processing and analysis of data resources, and intelligent mining of information and knowledge. However, there are still some difficulties in using deep learning for language modeling for technology development and application service in the library and information science (LIS) field. Therefore, this study systematically reviews and reveals the research progress, technical principles, and development methods of deep learning for language modeling, with the aim at providing reliable theoretical basis and feasible methodological paths for the deep understanding and application of deep learning for language modeling for librarians and fellow practitioners. [Method/Process] The data used in this study were collected from the WOS core database, CNKI literature database, arXiv preprint repository, GitHub open-source software hosting platform and the open resources on the Internet. Based on these data, this paper first systematically investigates the background, basic feature representation algorithms, and representative application development tools of deep learning for language modeling, reveals their dynamic evolution and technical principles, and analyzes the advantages and disadvantages and applicability of each algorithm model and development tool. Second, an in-depth analysis of the possible challenging problems faced by the development and application of deep learning for language modeling was performed, and two strategic approaches to expand their application capabilities were put forward. [Results/Conclusions] The important challenges faced by the application and development of deep learning for language modeling include numerous parameters and difficulties to adjust accuracy, relying on a large amount of accurate training data, difficulties in making changes, and the intellectual property and information security issues. In the future, we will start from two aspects of specific domains and feature engineering to expand and improve the application capabilities of deep learning for language modeling. Specifically, we focus on consideration of the collection and preparation of domain data, selection of model architecture, participation of domain experts, and optimization for specific tasks, in order to ensure that the data source of the model is more reliable and secure, and the application effect is more accurate and practical. Moreover, the strategic methods for feature engineering to expand the application capabilities of deep learning for language modeling include selecting appropriate features, feature pre-processing, feature selection, and feature dimensionality reduction. These strategies can help improve the performance and efficiency of deep learning for language models, making them more suitable for specific tasks or domains. To sum up, LIS institutions should leverage the deep learning for language modeling related technologies, guided by the needs of scientific research and social development, and based on advantages of existing literature data resources and knowledge services; they should carry out innovative professional or vertical domain intelligent knowledge management and application service, and develop technology and systems with independent intellectual property rights, which is their long-term sustainable development path.

Key words: deep learning, language model, neural network, pre-trained model, word embedding

CLC Number:

G202

WANG Sili, ZHANG Ling, YANG Heng, LIU Wei. Review of Deep Learning for Language Modeling[J].Journal of Library and Information Science in Agriculture, 2023, 35(8): 4-18.

References

[1] ZHAO W X, ZHOU K, LI J Y, et al. A survey of large language models[J]. arXiv Preprint, arXiv:2303.18223, 2023.
[2] QIU X P, SUN T X, XU Y G, et al.Pre-trained models for natural lan-guage processing: A survey[J]. Science China technological sciences, 2020, 63(10): 1872-1897.
[3] 毛进, 陈子洋. 基于深度学习的科技文献摘要结构功能识别研究[J]. 农业图书情报学报, 2022, 34(3): 15-27.
MAO J, CHEN Z Y.A Deep learning based approach to structural function recognition of scientific literature abstracts[J]. Journal of library and information science in agriculture, 2022, 34(3): 15-27.
[4] 康明. 深度学习预训练语言模型-案例篇: 中文金融文本情绪分类研究[M]. 北京: 清华大学出版社, 2022.
KANG M.Deep learning pre-training language model-case: A study on emotion classification of Chinese financial texts[M]. Beijing: Tsinghua University Press, 2022.
[5] HINTON G E, OSINDERO S, TEH Y W.A fast learning algorithm for deep belief nets[J]. Neural computation, 2006, 18(7): 1527-1554.
[6] YUSUKE S.Java deep learning essentials[M]. Beijing: China Ma-chine Press, 2017: 97-113.
[7] IENCO D, GAETANO R, INTERDONATO R, et al.Combining sen-tinel-1 and sentinel-2 time series via RNN for object-based land cov-er classification[C]// IGARSS 2019-2019 IEEE International Geo-science and Remote Sensing Symposium. Piscataway, New Jersey: IEEE, 2019: 4881-4884.
[8] JI S H, VISHWANATHAN S V N, SATISH N, et al. BlackOut: Speeding up recurrent neural network language models with very large vocabularies[J]. arXiv Preprint, arXiv:1511.06909, 2015.
[9] RNNLM Toolkit[EB/OL].[2023-02-20].https://github.com/IntelLabs/rnnlm.
[10] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[J]. arXiv Preprint, arXiv:1409.3215, 2014.
[11] CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv Preprint, arXiv:1406.1078, 2014.
[12] KIM Y. Convolutional neural networks for sentence classification[J]. arXiv Preprint, arXiv:1408.5882, 2014.
[13] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[J]. arXiv Preprint, arXiv:1607.01759, 2016.
[14] LIU P F, QIU X P, HUANG X J. Recurrent neural network for text classification with multi-task learning[J]. arXiv Preprint, arXiv:1605.05101, 2016.
[15] LAI S W, XU L H, LIU K, et al.Recurrent convolutional neural networks for text classification[C]// Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. New York: ACM, 2015: 2267-2273.
[16] JOHNSON R, ZHANG T.Deep pyramid convolutional neural networks for text categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2017: 562-570.
[17] PAPPAS N, POPESCU-BELIS A. Multilingual hierarchical attention networks for document classification[J]. arXiv Preprint, arXiv:1707.00896, 2017.
[18] KIM Y, LEE H, JUNG K. Attention-based convolutional neural networks for multi-label emotion classification[EB/OL].[2018-01-01]. http://sciencewise.info/articles/1804.00831/.
[19] TensorFlow[EB/OL].[2023-02-25].https://tensorflow.google.cn/.
[20] Deeplearning4j[EB/OL].[2023-02-25].https://github.com/deep-learning4j.
[21] PyTorch[EB/OL].[2023-02-25].https://pytorch.org/.
[22] Theano[EB/OL].[2023-02-25].https://pypi.org/project/Theano/.
[23] Keras[EB/OL].[2023-02-25].https://keras.io/.
[24] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv Preprint, arXiv:1301.3781, 2014.
[25] LE Q V, MIKOLOV T.Distributed representations of sentences and documents[C]//ICML'14 Proceedings of the 31st International Conference on International Conference on Machine Learning. Beijing, China: ICML, 2014(32): 1188-1196.
[26] JEFFREY P, RICHARD S, CHRISTOPHER D M.GloVe: Global vectors for word representation[EB/OL].[2018-12-29].https://nlp.stanford.edu/projects/glove/.
[27] NIU L Q, DAI X Y, ZHANG J B, et al.Topic2Vec: Learning dis-tributed representations of topics[C]// 2015 International Conference on Asian Language Processing(IALP). Piscataway, New Jersey: IEEE, 2016: 193-196.
[28] MOODY C E. Mixing dirichlet topic models and word embeddings to make lda2vec[J]. arXiv Preprint, arXiv:1605.02019, 2016.
[29] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for im-age recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway, New Jersey: IEEE, 2016: 770-778.
[30] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[J]. arXiv Preprint, arXiv:1706.03762, 2017.
[31] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[J]. arXiv Preprint, arXiv:1802.05365, 2018.
[32] REDDY R. Universal language model fine-tuning for text classification[J]. arXiv Preprint, arXiv:1801.06146, 2018.
[33] GPT-2[EB/OL].[2023-02-28].https://github.com/openai/gpt-2.
[34] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv Preprint, arXiv:1810.04805, 2019.
[35] DAI Z H, YANG Z L, YANG Y M, et al.Transformer-XL: Attentive language models beyond a fixed-length context[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019.
[36] YANG Z L, DAI Z H, YANG Y M, et al.XLNet: Generalized autore-gressive pretraining for language understanding[J]. arXiv Preprint, arXiv: 1906.08237, 2019.
[37] ZHONG H X, ZHANG Z Y, LIU Z Y, et al.Open Chinese language pre-trained model zoo[EB/OL].[2020-03-18].https://github.com/thunlp/OpenCLaP.
[38] CUI Y M, CHE W X, LIU T, et al.Pre-training with whole word masking for Chinese BERT[EB/OL].[2023-03-09].https://github.com/ymcui/Chinese-BERT-wwm.
[39] XU L.RoBERTa for Chinese[EB/OL].[2022-06-15].https://github.com/brightmart/roberta_zh.
[40] ALAN A, DUNCAN B, ROLAND V.Contextual string embeddings for sequence labeling[EB/OL].[2023-03-10].https://github.com/zalandoresearch/flair.
[41] Stanford NLP[EB/OL].[2023-03-10].https://github.com/stanfordnlp.
[42] ChatGPT: Optimizing language models for dialogue[EB/OL].[2023-03-16].https://openai.com/blog/chatgpt.
[43] NISAN S, LONG O, JEFFREY W, et al.Learning to summarize with human feedback[C]//Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020: 3008-3021.
[44] LEO G, JOHN S, JACOB H. Scaling laws for reward model overoptimization[J]. arXiv Preprint, arXiv:2210.10760, 2022.
[45] GPT-4[EB/OL].[2023-03-16].https://openai.com/product/gpt-4.
[46] 刘高畅, 杨然. ChatGPT需要多少算力[R/OL]. 北京: 国盛证券, 2023.
LIU G C, YANG R.How much computing power does ChatGPT require[R/OL]. Beijing: Guosen Securities, 2023.
[47] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al.Dropout: A simple way to prevent neural networks from overfitting[J]. Journal of machine learning research, 2014, 15: 1929-1958.
[48] Al text classifier[EB/OL].[2023-03-16].https://platform.openai.com/ai-text-classifier.
[49] AIGC-X[EB/OL].[2023-03-16]. http://ai.sklccc.com.
[50] VAN DIS E A M, BOLLEN J, ZUIDEMA W, et al. ChatGPT: Five priorities for research[J]. Nature, 2023, 614(7947): 224-226.
[51] Prompt engineer and librarian[EB/OL].[2023-03-31].https://jobs.lever.co/Anthropic/e3cde481-d446-460f-b576-93cab67bd1ed.
[52] 张智雄, 钱力, 谢靖, 等. ChatGPT对科学研究和文献情报工作的影响[R/OL]. 北京: 国家科技图书文献中心 & 中国科学院文献情报中心, 2023.
ZHANG Z X, QIAN L, XIE J, et al.The Impact of ChatGPT on scientific research and documentation and information work[R/OL]. Beijing: National Science and Technology Library & National Science Library of Chinese Academy of Sciences, 2023.
[53] 张晓林. 从猿到人:探索知识服务的凤凰涅槃之路[J]. 数据分析与知识发现, 2023, 7(3): 1-4.
ZHANG X L.From ape to man: Exploring the phoenix nirvana road of knowledge service[J]. Data analysis and knowledge discovery, 2023, 7(3): 1-4.
[54] 曹树金, 曹茹烨. 从ChatGPT看生成式AI对情报学研究与实践的影响[J]. 现代情报, 2023, 43(4): 3-10.
CAO S J, CAO R Y.Influence of generative AI on the research and practice of information science from the perspective of ChatGPT[J]. Journal of modern information, 2023, 43(4): 3-10.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Review of Deep Learning for Language Modeling

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0

[1]	LIU Nanzhu, CUI Yunpeng, WANG Mo. Construction and Application of Semantic Retrieval Model for Ancient Agricultural Literature [J]. Journal of Library and Information Science in Agriculture, 2023, 35(7): 52-62.
[2]	SHOU Jianqi. Towards Known Unknowns: GPT Large Language Models Empower Human-Centered Information Retrieval [J]. Journal of Library and Information Science in Agriculture, 2023, 35(5): 16-26.
[3]	LU Lina, YU Xiao. Recognition and Classification of Deep Learning in Soybean Leaf Image Data Management [J]. Journal of Library and Information Science in Agriculture, 2023, 35(2): 87-94.
[4]	ZHAO Ruixue, HUANG Yongwen, MA Weilu, DONG Wenjia, XIAN Guojian, SUN Tan. Insights and Reflections of the Impact of ChatGPT on Intelligent Knowledge Services in Libraries [J]. Journal of Library and Information Science in Agriculture, 2023, 35(1): 29-38.
[5]	SHI Yunlai, CUI Yunpeng, DU Zhigang. A Classification Method of Agricultural News Text Based on BERT and Deep Active Learning [J]. Journal of Library and Information Science in Agriculture, 2022, 34(8): 19-29.
[6]	HOU Xiangying, CUI Yunpeng, LIU Juan. Applications and Prospect Analysis of Deep Learning in Plant Genomics and Crop Breeding [J]. Journal of Library and Information Science in Agriculture, 2022, 34(8): 4-18.
[7]	LI Bo, LI Honglian, GUAN Qing, LIU Yang. Fine-grained Sentiment Analysis of Social Network Platform of University Libraries Based on CNN-BiLSTM-HAN Hybrid Neural Network [J]. Journal of Library and Information Science in Agriculture, 2022, 34(4): 63-73.
[8]	MAO Jin, CHEN Ziyang. A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts [J]. Journal of Library and Information Science in Agriculture, 2022, 34(3): 15-27.
[9]	ZHANG Zhixiong, LIU Huan, YU Gaihong. Building an Artificial Intelligence Engine Based on Scientific and Technological Literature Knowledge [J]. Journal of Library and Information Science in Agriculture, 2021, 33(1): 17-31.
[10]	LYU Lucheng, HAN Tao. Artificial Intelligence Empowers Library and Information Service ——Review of Forums about Information Technology for Library 2019 [J]. Journal of Library and Information Science in Agriculture, 2020, 32(5): 13-18.
[11]	YU Li. Discipline Development Trend Analysis based on Text Semantic Understanding [J]. Journal of Library and Information Science in Agriculture, 2020, 32(3): 29-36.
[12]	WANG Xuejing. Research on Intelligent Service Mode of Digital Library Based on Deep Learning Technology [J]. , 2018, 30(9): 150-153.
[13]	WANG Ping. Research on the Service Quality Evaluation Method and Index System of Digital University Library [J]. , 2017, 29(6): 132-136.
[14]	FAN Yi-wen. College Library User Satisfaction Evaluation Model Based on RBF Neural Network [J]. , 2016, 28(3): 10-13.
[15]	BIAN Li-qin, CHEN Feng. Model of Book Funds Allocation Based on Artificial Intelligence [J]. , 2014, 26(2): 104-106.