中文    English

Journal of Library and Information Science in Agriculture ›› 2024, Vol. 36 ›› Issue (4): 45-62.doi: 10.13998/j.cnki.issn1002-1248.24-0158

Previous Articles     Next Articles

Methodology for Assessing the Influence of Technical Topics Based on PhraseLDA-SNA and Machine Learning

XIANG Rui1, SUN Wei1,2,*   

  1. 1. Institute of Agricultural Information, Chinese Academy of Agricultural Sciences, Beijing 100081;
    2. Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081
  • Received:2024-02-29 Online:2024-04-05 Published:2024-07-29

Abstract: [Purpose/Significance] Accurately measuring the influence of technical topics is crucial for decision-makers to understand the developmental trends in the technology sector. It is also an important link in identifying emerging, cutting-edge, and disruptive technical topics. Traditional methods of measuring technical topic influence are significantly affected by the latency of patent data approval and citations, lack a forward-looking perspective on the potential influence of technical topics, and suffer from insufficient semantic richness in the extraction of technical topics. This paper presents a method for measuring technical topic influence based on PhraseLDA-SNA and machine learning. It aims to mitigate the impact of delays in patent data approval and citation, while improving the interpretability and accuracy of the results in assessing technical topic influence. [Method/Process] In this study the explicit and implicit determinants of technical topic influence were first analyzed, based on which an index system for measuring technical topic influence was constructed. Then, the PhraseLDA model was used to extract semantically rich technical topics from a large corpus of pre-processed patent texts and to compute the topic-patent association probabilities. PhraseLDA-SNA enhances the semantic richness of technical topic extraction and deepens the analysis of topic content. Machine learning methods leverage their robust data processing and analysis capabilities to predict the high citation potential of patents related to the topics. This research integrates PhraseLDA-SNA and machine learning methods to accurately measure the significance and advanced nature of technical topics in promoting field development, thereby achieving an accurate measurement of the influence of technical topics. Finally, an empirical study was conducted in the field of cellulose biodegradation to compare the high-impact technical topics identified by the proposed method with those identified by the traditional method. Several experts with high academic influence and extensive experience in cellulose biodegradation research were invited to evaluate the high-impact technical topics identified in this study, thus validating the effectiveness of the proposed method. [Results/Conclusions] Compared with the traditional method, the technical topic influence measurement approach based on PhraseLDA-SNA and machine learning reveals more in-depth content. Moreover, this method also analyzes the importance and leading nature of technical topics, which shows superiority in quantitative analysis. Comparing the distribution of high-impact technical topic-related patents identified by the two methods across different years, the topics identified by the proposed method had a higher association ratio in the most recent data, indicating a significant reduction in the impact of patent data approval and citation delays.

Key words: topic mining, patent, influence measurement, machine learning, intellectual property, technology forecasting

CLC Number: 

  • G353.1
[1] WANG J L, FAN Y, ZHANG H, et al.Technology hotspot tracking: Topic discovery and evolution of China's blockchain patents based on a dynamic LDA model[J]. Symmetry, 2021, 13(3): 415.
[2] WANG Q.A bibliometric model for identifying emerging research topics[J]. Journal of the association for information science and technology, 2018, 69(2): 290-304.
[3] 田红云, 王铭瑟, 田丰. 跨界颠覆性创新的早期识别方法及实证检验[J]. 情报杂志, 2024, 43(5): 87-96, 105.
TIAN H Y, WANG M S, TIAN F.Early identification methods and empirical tests of cross-border disruptive innovation[J]. Journal of intelligence, 2024, 43(5): 87-96, 105.
[4] 赵磊, 章成志. 基于不同内容层面的特定领域研究主题差异分析研究[J]. 农业图书情报学报, 2021, 33(5): 14-27.
ZHAO L, ZHANG C Z.Difference analysis of research topics in a specific domain based on different content levels[J]. Journal of library and information science in agriculture, 2021, 33(5): 14-27.
[5] 宋凯, 朱彦君. 专利前沿技术主题识别及趋势预测方法——以人工智能领域为例[J]. 情报杂志, 2021, 40(1): 33-38.
SONG K, ZHU Y J.Patent frontier technology topic identification and trend prediction: A case analysis of artificial intelligence[J]. Journal of intelligence, 2021, 40(1): 33-38.
[6] 吕鲲, 项旻昊, 靖继鹏. 基于LDA2Vec和DTM模型的颠覆性技术主题识别研究——以能源科技领域为例[J]. 图书情报工作, 2023, 67(12): 89-102.
LV K, XIANG M H, JING J P.Identification of disruptive technology topics based on LDA2Vec and DTM models: A case study in the energy technology field[J]. Library and information service, 2023, 67(12): 89-102.
[7] CHARMANAS K, MITTAS N, ANGELIS L.Topic and influence analysis on technological patents related to security vulnerabilities[J]. Computers & security, 2023, 128: 103128.
[8] CHOI H, WOO J.Investigating emerging hydrogen technology topics and comparing national level technological focus: Patent analysis using a structural topic model[J]. Applied energy, 2022, 313: 118898.
[9] WANG J, HSU C C.A topic-based patent analytics approach for exploring technological trends in smart manufacturing[J]. Journal of manufacturing technology management, 2020, 32(1): 110-135.
[10] 王康, 陈悦. 基于异质性专利的颠覆性技术早期识别研究[J]. 科学学研究, 2023, 41(8): 1364-1375.
WANG K, CHEN Y.Research on early identification of disruptive technologies based on heterogeneous patents[J]. Studies in science of science, 2023, 41(8): 1364-1375.
[11] LI X, WEN Y, JIANG J J, et al.Identifying potential breakthrough research: A machine learning method using scientific papers and Twitter data[J]. Technological forecasting and social change, 2022, 184: 122042.
[12] KUMARI R, JEONG J Y, LEE B H, et al.Topic modelling and social network analysis of publications and patents in humanoid robot technology[J]. Journal of information science, 2021, 47(5): 658-676.
[13] GEUM Y, KIM M.How to identify promising chances for technological innovation: Keygraph-based patent analysis[J]. Advanced engineering informatics, 2020, 46: 101155.
[14] ZHONG Y X.A theory of semantic information[J]. China communications, 2017, 14(1): 1-17.
[15] 马永红, 孔令凯, 林超然, 等. 基于专利挖掘的关键共性技术识别研究[J]. 情报学报, 2020, 39(10): 1093-1103.
MA Y H, KONG L K, LIN C R, et al.Key generic technology identification based on patent mining[J]. Journal of the China society for scientific and technical information, 2020, 39(10): 1093-1103.
[16] 王山, 谭宗颖. 关键核心技术识别赋能新质生产力发展:内在逻辑、现实挑战与实践路径[J]. 农业图书情报学报, 2024, 36(2): 26-35.
WANG S, TAN Z Y.Identification of key core technologies enables the development of new quality productive forces[J]. Journal of li-brary and information science in agriculture, 2024, 36(2): 26-35.
[17] YU Z G, JOHNSON T R, KAVULURU R.Phrase based topic modeling for semantic information processing in biomedicine[C]//2013 12th International Conference on Machine Learning and Applications. Piscataway, New Jersey: IEEE, 2013: 440-445.
[18] 张琴, 张智雄. 基于PhraseLDA模型的主题短语挖掘方法研究[J]. 图书情报工作, 2017, 61(8): 120-125.
ZHANG Q, ZHANG Z X.Topical phrase mining based on the PhraseLDA model[J]. Library and information service, 2017, 61(8): 120-125.
[19] MCCLELLAND D C.Testing for competence rather than for "intelligence."[J]. American psychologist, 1973, 28(1): 1-14.
[20] 吴晓凤, 高峰, 蔡国瑞. 正反冰山模型与知识晶炼理论的融合发展[J]. 图书馆理论与实践, 2019(2): 37-42.
WU X F, GAO F, CAI G R.The integration development of positive-negative iceberg model and knowledge crystallization theory[J]. Library theory and practice, 2019(2): 37-42.
[21] 申媛媛, 邬锦雯, 李丹. 基于熵权法的数字乡村微观测度模型研究[J]. 农业图书情报学报, 2020, 32(4): 68-76.
SHEN Y Y, WU J W, LI D.Research on digital village micro-observation model based on entropy weight method[J]. Journal of library and information science in agriculture, 2020, 32(4): 68-76.
[22] 王岩, 王会丽. 高校专利申请前评估的理念与实践问题探讨[J]. 中国高校科技, 2022(10): 75-83.
WANG Y, WANG H L.Discussion on the concept and practice of patent pre-application evaluation in colleges and universities[J]. China university science & technology, 2022(10): 75-83.
[23] 罗素平, 寇翠翠, 金金, 等. 基于离群专利的颠覆性技术预测——以中药专利为例[J]. 情报理论与实践, 2019, 42(7): 165-170.
LUO S P, KOU C C, JIN J, et al.Disruptive technology prediction based on outlier patents: Traditional Chinese medicine patents as an example[J]. Information studies: Theory & application, 2019, 42(7): 165-170.
[24] FREEMAN L C.Centrality in social networks conceptual clarification[J]. Social networks, 1978, 1(3): 215-239.
[25] 李宜展,孔晔晗,李泽霞.可拓理论在技术演化与预测中的应用潜力[J/OL].现代情报:1-25[2024-06-18].http://kns.cnki.net/kcms/detail/22.1182.G3.20240522.1732.002.html.
LI Y Z, KONG Y H, LI Z X. The potential application of the theory of fuzzy sets in technological evolution and prediction[J/OL]. Journal of modern information:1-25[2024-06-18].http://kns.cnki.net/kcms/detail/22.1182.G3.20240522.1732.002.html.
[26] 郑航, 叶阿忠. 面向高新技术领域的跨国专利质量测度研究[J]. 科技进步与对策, 2023: 1-10.
ZHENG H, YE A Z.Measurement of transnational patent quality in the high-tech fields[J]. Science & technology progress and polic, 2023: 1-10.
[27] 冉从敬, 李旺, 胡启彪, 等. 基于机器学习的成本法在专利价值评估中的应用研究——以“新能源汽车”为例[J]. 现代情报, 2024, 44(5): 140-152.
RAN C J, LI W, HU Q B, et al.Research on the application of machine learning-based cost method in patent value assessment - Taking "new energy vehicle" as the case[J]. Journal of modern infor-mation, 2024, 44(5): 140-152.
[28] LEE C Y, KWON O, KIM M, et al.Early identification of emerging technologies: A machine learning approach using multiple patent indicators[J]. Technological forecasting and social change, 2018, 127: 291-303.
[29] HAN S Q, HUANG H L, HUANG X H, et al.Core patent forecasting based on graph neural networks with an application in stock markets[J]. Technology analysis & strategic management, 2022: 1-15.
[30] CHUNG P, SOHN S Y.Early detection of valuable patents using a deep learning model: Case of semiconductor industry[J]. Technological forecasting and social change, 2020, 158: 120146.
[31] 马建红, 姬帅, 刘硕. 面向专利的主题短语提取[J]. 计算机工程与设计, 2019, 40(5): 1365-1369, 1382.
MA J H, JI S, LIU S.Topical phrase mining for patent[J]. Computer engineering and design, 2019, 40(5): 1365-1369, 1382.
[32] EL-KISHKY A, SONG Y L, WANG C, et al.Scalable topical phrase mining from text corpora[J]. Proceedings of the VLDB
endowment, 2014, 8(3): 305-316.
[33] 刘俊婉, 龙志昕, 王菲菲. 基于LDA主题模型与链路预测的新兴主题关联机会发现研究[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
LIU J W, LONG Z X, WANG F F.Finding collaboration opportunities from emerging issues with LDA topic model and link prediction[J]. Data analysis and knowledge discovery, 2019, 3(1): 104-117.
[34] 郭剑明, 王婧怡, 袁润. 基于网络快照的核心专利预测方法研究[J]. 情报理论与实践, 2024: 1-11.
GUO J M, WANG J Y, YUAN R.Research on core patent predic-tion method based on network snapshot[J]. Information studies: Theory & application, 2024: 1-11.
[35] CHANDRA M A, BEDI S S.Survey on SVM and their application in imageclassification[J]. International journal of information tech-nology, 2021, 13(5): 1-11.
[36] CHARBUTY B, ABDULAZEEZ A.Classification based on decision tree algorithm for machine learning[J]. Journal of applied science and technology trends, 2021, 2(1): 20-28.
[37] RYMARCZYK T, KOZ OWSKI E, K OSOWSKI G, et al. Logistic regression for machine learning in process tomography[J]. Sensors, 2019, 19(15): 3400.
[38] BREIMAN L.Random forests[C]//Machine learning for signal pro-cessing 17. Proceedings of the 2007 IEEE signal processing society workshop, 2001, 45: 5-32.
[39] FREUND Y, SCHAPIRE R E.A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of computer and system sciences, 1997, 55(1): 119-139.
[40] THAPA S, MISHRA J, ARORA N, et al.Microbial cellulolytic en-zymes: Diversity and biotechnology with reference to lignocellulosic biomass degradation[J]. Reviews in environmental science and bio/technology, 2020, 19(3): 621-648.
[41] 孙慧敏, 邹丽花, 郑兆娟, 等. 应用生物技术降解木质纤维素水解液中呋喃醛[J]. 生物工程学报, 2021, 37(2): 473-485.
SUN H M, ZOU L H, ZHENG Z J, et al.Biodegradation of furan aldehydes in lignocellulose hydrolysates[J]. Chinese journal of biotechnology, 2021, 37(2): 473-485.
[42] 刘婷, 赵亚娟. 技术机会识别研究综述与展望[J]. 农业图书情报学报, 2023, 35(7): 4-17.
LIU T, ZHAO Y J.Review and prospect of research on technology opportunity identification[J]. Journal of library and information sci-ence in agriculture, 2023, 35(7): 4-17.
[43] MANN G S, MIMNO D, MCCALLUM A.Bibliometric impact mea-sures leveraging topic analysis[C]//Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries. New York: ACM, 2006: 65-74.
[44] JOSHI A, KALE S, CHANDEL S, et al.Likert scale: Explored and explained[J]. British journal of applied science & technology, 2015, 7(4): 396-403.
[1] LYU Chen, JIANG Linjun. Open Source Software: A Study on the Copyright Licensing Investigation System of DMCA in the United States [J]. Journal of Library and Information Science in Agriculture, 2023, 35(8): 78-87.
[2] WU Lei, LI Xiaojie, DING Qian, SUN Wei, ZHOU Zhengkui. Comparative Study on the Technology Gaps in the Field of Animal Husbandry and Veterinary Genomics between China and Foreign Countries [J]. Journal of Library and Information Science in Agriculture, 2023, 35(8): 88-97.
[3] XU Yue, XI Zijie, PAN Chao. Intellectual Property Protection of Scientific Data in the Algorithm Era: Factors Influencing Service Quality and Optimization Strategies [J]. Journal of Library and Information Science in Agriculture, 2023, 35(11): 23-39.
[4] WU Ning, YANG Yanping. Analysis Framework of Science-Technology Microcosmic Knowledge Flow Based on a Multi-layer Network [J]. Journal of Library and Information Science in Agriculture, 2023, 35(11): 40-52.
[5] XIA Dong, XU Yingqi, WANG Chao, REN Bo. Challenges and Countermeasures of Targeted Scientific and Technical Novelty Search [J]. Journal of Library and Information Science in Agriculture, 2022, 34(7): 88-97.
[6] RAN Congjing, XIAO Dongmei, HUANG Haiying, WEN Yuheng, FANG Zhouzhi, LONG Jin, LIU Yan. Expert Interviews on the Key Points to Build China into a Strong Country on Intellectual Property Rights [J]. Journal of Library and Information Science in Agriculture, 2022, 34(7): 5-13.
[7] MENG Jing, TANG Yan. Analysis and Research Progress of Global Patent Technology of Wheat Genetics and Breeding [J]. Journal of Library and Information Science in Agriculture, 2022, 34(6): 93-103.
[8] LIAO Siqin, ZHOU Yu. The Current Situation and Analysis of Patent Information Service of University Libraries in China: Taking the IPR Information Service Centers of 23 Universities as an Example [J]. Journal of Library and Information Science in Agriculture, 2022, 34(2): 63-74.
[9] CHENG Xingru, KANG Yuli, MENG Ziyun, LI Nan, TANG Qiaoling, WANG Youhua. Progress Analysis and Prospects of Bt Gene Research and Development Based on Global Patents [J]. Journal of Library and Information Science in Agriculture, 2022, 34(11): 81-91.
[10] ZHAO Wanjing, LIU Minjuan, LIU Hongbing, WANG Xin, DUAN Feihu. A Fine-grained Extraction Method of Chapter Structure of Documents Based on PDF Layout Features [J]. Journal of Library and Information Science in Agriculture, 2021, 33(9): 93-103.
[11] GUAN Peng, WANG Yuefen, HUANG Qin, YE Longsheng, FU Zhu. Research on the Impact of Regional Innovation Cooperation Network on Technological Innovation Performance: Based on the Yangtze River Delta [J]. Journal of Library and Information Science in Agriculture, 2021, 33(6): 40-53.
[12] XU Yi, LI Jing, XU Haiyun, LI Shuying. Identification of Technology Transfer Potential Based on Patent Dynamic Characteristics [J]. Journal of Library and Information Science in Agriculture, 2021, 33(6): 107-115.
[13] ZENG Jinjing, LIU Tian, ZHANG Rui. Patent Information Service Strategies of Academic Libraries Oriented to Patent Supply Chain [J]. Journal of Library and Information Science in Agriculture, 2021, 33(5): 40-50.
[14] LI Lei, SONG JianNing, SONG TianHua. Technology Forecasting Based on Topic Identification of Online Innovation Communities and S-Curve [J]. Journal of Library and Information Science in Agriculture, 2021, 33(4): 45-57.
[15] WU Qingyuan, ZHAO Sidi. Comparative Research of Patent Citations of Patent Applicants and Patent Examiners:A Case Study in the Field of 5G Communication Technologies [J]. Journal of Library and Information Science in Agriculture, 2021, 33(3): 16-27.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!