中文    English

Journal of library and information science in agriculture ›› 2025, Vol. 37 ›› Issue (6): 55-69.doi: 10.13998/j.cnki.issn1002-1248.25-0275

Previous Articles    

Identification of Emerging Technology Topics and Prediction of Trends Using a Method Integrating BERTopic and IWOA-BiLSTM Models

CHEN Yuanyuan1,2, FU Bin3, GAO Yuan3, QIAO Junwei1,2   

  1. 1.Shanghai Publishing and Printing College, Shanghai 200093
    2.Key Laboratory of Intelligent and Green Flexographic Printing, National Press and Publication Administration, Shanghai 200093
    3.College of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054
  • Received:2025-03-27 Online:2025-06-05 Published:2025-09-16

Abstract:

[Purpose/Significance] With the rapid advancement of global science and technology, emerging technologies are constantly evolving, placing higher demands on national strategic planning and resource allocation. Artificial intelligence (AI), as a core driver of the current technological revolution, requires close attention to its internal technical topic evolution to anticipate disruptive changes and guide the direction innovation. Although existing research primarily focuses on identifying technical topics through bibliometric or patent analysis, there is still insufficient quantitative forecasting of their future development. To address this gap, this study proposes an integrated analytical framework that combines BERTopic-based topic modeling with an IWOA-optimized BiLSTM neural network, achieving a unified approach to both topic identification and trend forecasting. Unlike traditional LDA models or expert-based subjective judgment, this method demonstrates significant advancements in semantic representation, model optimization, and prediction accuracy. It expands the theoretical boundaries of emerging technology forecasting and offers valuable quantitative support for science and technology policy-making. [Method/Process] This study utilized 22,243 AI-related patent records collected from 2015 to 2024. BERTopic was applied to extract representative technology topics from patent abstracts. A multi-dimensional evaluation system was constructed using three indicators: novelty, impact, and growth rate, capturing different aspects of emerging technologies. The CRITIC method was employed to objectively assign weights to each dimension, enhancing the robustness and balance of the composite index. BERTopic, which integrates BERT-based semantic embeddings with HDBSCAN density-based clustering, improves both the coherence and granularity of topic extraction. For trend prediction, an Improved Whale Optimization Algorithm (IWOA) was introduced to fine-tune BiLSTM's learning rate, epoch count, and hidden layer size. IWOA enhances global optimization through Gaussian chaos initialization, Levy flight strategy, nonlinear control factors, and elite reverse learning. [Results/Conclusions] Experimental results show that BERTopic achieves superior topic coherence compared to baseline models and successfully identifies five emerging technical areas, including Intelligent Models and Algorithms, Information Processing, Deep Neural Networks, Smart Robotics, and Numerical Control Systems. The IWOA-BiLSTM model outperforms conventional LSTM and BiLSTM models in error metrics (MAPE, RMSE, and MAE), confirming its predictive advantage. Forecast results indicate that these emerging topics will experience sustained growth over the next five years, reflecting strong application potential and industrial value. This study confirms the feasibility and effectiveness of the integrated "identification–prediction" framework, providing a data-driven tool for strategic decision-making in science and technology development. Limitations include dependence on data quality and a current focus on the field of AI. Future research should expand the framework to other strategic areas, such as renewable energy, biomedicine, and intelligent manufacturing, to further validate its generalizability.

Key words: emerging technologies, topic identification trend prediction, BERTopic, IWOA-BiLSTM

CLC Number: 

  • G353

Fig.1

Research framework diagram"

Fig.2

Schematic diagram of the LSTM architecture"

Fig.3

Schematic diagram of the BiLSTM architecture"

Fig.4

AI patent applications by year"

Fig.5

Topic coherence score curve"

Fig.6

Topic coherence comparison of different models"

Table 1

Topic-term lexicon"

主题0主题1主题2主题3主题4主题5主题6主题7主题8主题9主题10
数据用户人机交互场所存储模块控制器终端机器人数据项视频算法
模型响应界面智能家居输出模块驱动界面智能用户界面提取神经网络
机器学习内容受控人机交互电子封装指令定位数据包软件人工智能
训练检索机器学习网络平台服务输出交互安装交互式技术机器学习
图像接收显示服务器模块电机图形自动跟踪影像神经元
人工智能个性化显示装置智能接口旋转用户界面驱动交互人机交互损失函数
信息话题影像客户机电源彩色显示指令优化共享存储器技术
预测网络画面前景电源模块开关计算内部封装数据数字信号迭代
特征推送图像处理共享数控系统图像特征旋转虚拟控件卷积
设备身份分级选择总线电机人工智能辅助技术线路网络

Table 2

Topic labeling results"

主题主题描述主题主题描述
主题0智能模型与算法主题6计算机系统
主题1信息处理主题7智能机器人
主题2显示设备主题8数控系统
主题3泛智能化技术主题9语音识别
主题4智能硬件模块主题10深度神经网络
主题5信号处理技术

Table 3

Emergence score results for each topic from 2015 to 2024"

年份\主题主题0主题1主题2主题3主题4主题5主题6主题7主题8主题9主题10
20150.3300.2470.3460.3560.5650.5900.3360.3590.3100.3660.274
20160.3560.2980.3660.3530.5600.5700.3560.2610.2710.3360.325
20170.3550.3640.3360.3660.6400.5500.3160.2930.3060.3560.341
20180.3790.2940.3600.3460.5600.5900.3560.3150.3300.3360.357
20190.3280.3590.3400.3460.5900.6200.3660.3760.3920.3560.339
20200.3260.3420.3360.3660.5600.5300.3460.4120.4270.3460.398
20210.3810.3760.3420.3360.4500.4700.3660.3850.4060.3760.407
20220.4680.4400.3360.3560.4400.4600.3450.4270.4450.3360.449
20230.4950.4310.3560.3360.4200.4400.3260.4830.5120.3660.484
20240.5370.4630.3160.3460.4000.4300.3360.4440.5230.3260.472

Fig.7

Emergence scores of emerging technology topics"

Fig.8

Emergence scores of promising technology topics"

Fig.9

Emergence scores of declining technology topics"

Table 4

Optimal parameters of the IWOA-BiLSTM model"

主题主题描述参数
主题0智能模型与算法[0.009974361578792838, 99, 43, 24]
主题1信息处理[0.001931245897102367, 51, 177, 27]
主题7智能机器人[0.006741832658148721, 97, 81, 132]
主题8数控系统[0.005631027284216848, 26, 113, 12]
主题10深度神经网络[0.004836295528164477, 50, 199, 8]

Table 5

Comparison of prediction errors across models"

主题模型MAPERMSEMAE
主题0LSTM0.183 60.080 30.073 4
BiLSTM0.170 20.078 20.069 2
IWOA-BiLSTM0.165 80.070 70.066 3
主题1LSTM0.029 30.016 30.017 5
BiLSTM0.025 70.013 50.014 6
IWOA-BiLSTM0.022 50.009 20.008 2
主题7LSTM0.089 40.047 80.039 6
BiLSTM0.085 70.042 50.037 4
IWOA-BiLSTM0.082 50.038 40.034 1
主题8LSTM0.021 60.024 60.013 2
BiLSTM0.019 30.021 50.010 6
IWOA-BiLSTM0.016 80.011 60.008 5
主题10LSTM0.034 70.018 20.018 2
BiLSTM0.031 60.016 30.014 8
IWOA-BiLSTM0.028 50.013 80.012 3

Table 6

Forecasted degrees of emerging technology topics from 2025 to 2029"

主题主题描述20252026202720282029
主题0智能模型与算法0.4260.4670.5320.5480.567
主题1信息处理0.3340.3670.4300.4220.453
主题7智能机器人0.4550.4490.4580.4660.461
主题8数控系统0.4250.4060.4400.4950.504
主题10深度神经网络0.3990.4260.4290.4460.440
[1] 唐恒, 邱悦文. 多源信息视角下的多指标新兴技术主题识别研究: 以智能网联汽车领域为例[J]. 情报杂志, 2021, 40(3): 81-88.
TANG H, QIU Y W. Emerging technology topic identification based on multi-source information: Intelligent connected vehicle as an example[J]. Journal of intelligence, 2021, 40(3): 81-88.
[2] 周萌, 朱相丽. 新兴技术概念辨析及其识别方法研究进展[J]. 情报理论与实践, 2019, 42(10): 162-169.
ZHOU M, ZHU X L. Discrimination of the concept of emerging technologies and research progress on its identification methods[J]. Information studies: Theory & application, 2019, 42(10): 162-169.
[3] YANG S, JIANG M. Research progress on the connotation, characteristics, and identification methods of emerging technologies[J]. Information science, 2023, 41(5): 181-190.
[4] GAO N, GAO J, CHEN H. Research on identification and evolutionary path analysis methods of emerging technologies: A case study of the integrated circuit field[J]. Information science, 2023, 41(3): 127-135, 172.
[5] 李欣, 王静静, 杨梓, 等. 基于SAO结构语义分析的新兴技术识别研究[J]. 情报杂志, 2016, 35(3): 80-84.
LI X, WANG J J, YANG Z, et al. Identifying emerging technologies based on subject–action-object[J]. Journal of intelligence, 2016, 35(3): 80-84.
[6] 项芮, 孙巍. 基于PhraseLDA-SNA和机器学习的技术主题影响力测度方法研究[J]. 农业图书情报学报, 2024, 36(4): 45-62.
XIANG R, SUN W. Methodology for assessing the influence of technical topics based on PhraseLDA-SNA and machine learning[J]. Journal of library and information science in agriculture, 2024, 36(4): 45-62.
[7] 赵磊,章成志.基于不同内容层面的特定领域研究主题差异分析研究[J].农业图书情报学报, 2021, 33(5): 14-27.
ZHAO L, ZHANG C Z. Difference analysis of research topics in a specific domain based on different content levels[J]. Journal of library and information science in agriculture, 2021, 33(5): 14-27.
[8] 张新猛, 刘江鹏, 范亚茹, 等. 产业链视角下专利新兴技术主题识别[J]. 情报杂志, 2023, 42(8): 96-101, 55.
ZHANG X M, LIU J P, FAN Y R, et al. Identification emerging technology topics of patent from the perspective of industry chain[J]. Journal of intelligence, 2023, 42(8): 96-101, 55.
[9] 汪大锟, 化柏林. 基于BERTopic的新兴技术主题识别研究[J]. 科技情报研究, 2025, 7(1): 131-140.
WANG D K, HUA B L. Research on emerging technology topic identification based on BERTopic[J]. Scientific information research, 2025, 7(1): 131-140.
[10] SONG B, LUAN C, LIANG D. Emerging technology topic recognition from the perspective of technical feature similarity[J]. Soft science, 2023, 37(12): 80-85, 108.
[11] 宋博文, 栾春娟, 梁丹妮. 机器学习视域下新兴技术主题识别研究: 基于技术特征相似性[J]. 现代情报, 2022, 42(9): 49-57.
SONG B W, LUAN C J, LIANG D N. Recognition model of emerging technology topic from machine learning perspective: Based on similarity of technical characteristics[J]. Journal of modern information, 2022, 42(9): 49-57.
[12] 任惠超, 黄庆龙, 张祖国, 等. 船舶领域新兴技术主题识别技术研究[J]. 情报理论与实践, 2022, 45(11): 103-106.
REN H C, HUANG Q L, ZHANG Z G, et al. Research on topic identification technology of emerging technology in the ship field[J]. Information studies: Theory & application, 2022, 45(11): 103-106.
[13] 王山, 谭宗颖. 关键核心技术识别赋能新质生产力发展:内在逻辑、现实挑战与实践路径[J]. 农业图书情报学报, 2024, 36(2): 26-35.
WANG S, TAN Z Y. Identification of key core technologies enables the development of new quality productive forces[J]. Journal of library and information science in agriculture, 2024, 36(2): 26-35.
[14] ZHANG K, LÜ L C, HAN T, et al. Research on emerging technology identification from the perspective of “paper-patent” correlation[J]. Information studies: Theory and application, 2024, 47(9).
[15] 冉从敬, 田文芳. 融合SVM-LDA与加权相似度的潜在新兴技术识别研究: 以人工智能领域为例[J]. 情报学报, 2024, 43(5): 563-574.
RAN C J, TIAN W F. Identification of potential emerging technologies by fusing SVM-LDA and weighted similarity: Taking the field of artificial intelligence as an example[J]. Journal of the China society for scientific and technical information, 2024, 43(5): 563-574.
[16] 董放, 刘宇飞, 周源. 基于LDA-SVM论文摘要多分类新兴技术预测[J]. 情报杂志, 2017, 36(7): 40-45, 133.
DONG F, LIU Y F, ZHOU Y. Prediction of emerging technologies based on LDA-SVM multi-class abstract of paper classification[J]. Journal of intelligence, 2017, 36(7): 40-45, 133.
[17] 吴东雪, 沈桂兰. 一种基于LDA模型的新兴主题识别与探测方法[J]. 河南师范大学学报(自然科学版), 2024, 52(2): 72-80.
WU D X, SHEN G L. An emerging topic identification and detection method based on LDA model[J]. Journal of Henan normal university (natural science edition), 2024, 52(2): 72-80.
[18] LIU Y, XUE Y. Research on blockchain industry public opinion monitoring combining sentiment analysis and multivariate time series[J]. Information engineering, 2023, 9(1): 3-14.
[19] 刘婷, 赵亚娟. 技术机会识别研究综述与展望[J]. 农业图书情报学报, 2023, 35(7): 4-17.
LIU T, ZHAO Y J. Review and prospect of research on technology opportunity identification[J]. Journal of library and information science in agriculture, 2023, 35(7): 4-17.
[20] 张雪, 张志强, 朱冬亮. 基于时间序列分析的潜在学科交叉前沿主题识别研究[J]. 情报理论与实践, 2024, 47(4): 152-162.
ZHANG X, ZHANG Z Q, ZHU D L. Identifying potential interdisciplinary front topics based on time series analysis[J]. Information studies: Theory & application, 2024, 47(4): 152-162.
[21] 崔海燕, 李雅文, 徐欣. 基于时间卷积网络的科技需求主题热度预测算法[J]. 广西科学, 2022, 29(4): 627-633.
CUI H Y, LI Y W, XU X. Algorithm of subject heat of science and technology demand prediction based on time convolution network[J]. Guangxi sciences, 2022, 29(4): 627-633.
[22] CHEN W, CHEN W. Research on popularity prediction of emerging topics based on multivariable lstm model with bibliometric indicators[J]. Data analysis and knowledge discovery, 2022, 6(10): 35-45.
[23] 霍朝光, 霍帆帆, 董克. 基于LSTM神经网络的学科主题热度预测模型[J]. 图书情报知识, 2021, 38(2): 25-34.
HUO C G, HUO F F, DONG K. The popularity prediction of scientific topics based on LSTM[J]. Documentation, information & knowledge, 2021, 38(2): 25-34.
[24] 冯增喜, 李嘉乐, 葛珣, 等. 融合多策略改进鲸鱼优化算法及其应用[J]. 计算机集成制造系统, 2025, 31(2): 590-603.
FENG Z X, LI J L, GE X, et al. Integrating multi-strategy improved whale optimization algorithm and its application[J]. Computer integrated manufacturing systems, 2025, 31(2): 590-603.
[25] 许学国, 桂美增. 基于深度学习的技术预测方法: 以机器人技术为例[J]. 情报杂志, 2020, 39(8): 53-62.
XU X G, GUI M Z. Technology forecast based on deep learning: Using robot technology as an example[J]. Journal of intelligence, 2020, 39(8): 53-62.
[26] 龚扣林, 周宇, 丁笠, 等. 基于BiLSTM模型的漏洞检测[J]. 计算机科学, 2020, 47(5): 295-300.
GONG K L, ZHOU Y, DING L, et al. Vulnerability detection using bidirectional long short-term memory networks[J]. Computer science, 2020, 47(5): 295-300.
[27] 吴红, 伊惠芳, 马永新, 等. 面向专利技术主题分析的WI-LDA模型研究[J]. 图书情报工作, 2018, 62(17): 68-74.
WU H, YI H F, MA Y X, et al. WI-LDA: Technical topic analysis in patents[J]. Library and information service, 2018, 62(17): 68-74.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!