农业图书情报学报 ›› 2024, Vol. 36 ›› Issue (4): 45-62.doi: 10.13998/j.cnki.issn1002-1248.24-0158

• 研究论文 • 上一篇    下一篇

基于PhraseLDA-SNA和机器学习的技术主题影响力测度方法研究

项芮1, 孙巍1,2,*   

  1. 1.中国农业科学院农业信息研究所,北京 100081;
    2.农业农村部 农业大数据重点实验室,北京 100081
  • 收稿日期:2024-02-29 出版日期:2024-04-05 发布日期:2024-07-29
  • 通讯作者: *孙巍(1973- ),女,博士,研究员,博士生导师,研究方向为信息检索可视化、数据挖掘、知识组织。Email:sunwei@caas.cn
  • 作者简介:项芮(1999- ),硕士研究生,研究方向为信息管理与信息系统
  • 基金资助:
    国家重点研发计划项目“科技文献内容深度挖掘及智能分析关键技术和软件”(2022YFF0711900)

Methodology for Assessing the Influence of Technical Topics Based on PhraseLDA-SNA and Machine Learning

XIANG Rui1, SUN Wei1,2,*   

  1. 1. Institute of Agricultural Information, Chinese Academy of Agricultural Sciences, Beijing 100081;
    2. Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081
  • Received:2024-02-29 Online:2024-04-05 Published:2024-07-29

摘要: [目的/意义]旨在利用PhraseLDA-SNA和机器学习方法准确测度技术主题的影响力,以期为制定科技政策、优化资源配置提供理论参考。[方法/过程]本研究首先分析了技术主题影响力的显性及隐性决定因素,据此构建了技术主题影响力测度指标体系。其次,基于PhraseLDA-SNA与机器学习方法分析测度指标,实现对技术主题影响力的测度。最后,以纤维素生物降解领域为例进行实证研究,验证方法的有效性。[结果/结论]本研究提出的基于PhraseLDA-SNA和机器学习的技术主题影响力测度方法与传统方法相比,显著降低了受专利数据授权及引用时滞问题的影响。

关键词: 主题挖掘, 专利, 影响力测度, 机器学习, 知识产权, 技术预测

Abstract: [Purpose/Significance] Accurately measuring the influence of technical topics is crucial for decision-makers to understand the developmental trends in the technology sector. It is also an important link in identifying emerging, cutting-edge, and disruptive technical topics. Traditional methods of measuring technical topic influence are significantly affected by the latency of patent data approval and citations, lack a forward-looking perspective on the potential influence of technical topics, and suffer from insufficient semantic richness in the extraction of technical topics. This paper presents a method for measuring technical topic influence based on PhraseLDA-SNA and machine learning. It aims to mitigate the impact of delays in patent data approval and citation, while improving the interpretability and accuracy of the results in assessing technical topic influence. [Method/Process] In this study the explicit and implicit determinants of technical topic influence were first analyzed, based on which an index system for measuring technical topic influence was constructed. Then, the PhraseLDA model was used to extract semantically rich technical topics from a large corpus of pre-processed patent texts and to compute the topic-patent association probabilities. PhraseLDA-SNA enhances the semantic richness of technical topic extraction and deepens the analysis of topic content. Machine learning methods leverage their robust data processing and analysis capabilities to predict the high citation potential of patents related to the topics. This research integrates PhraseLDA-SNA and machine learning methods to accurately measure the significance and advanced nature of technical topics in promoting field development, thereby achieving an accurate measurement of the influence of technical topics. Finally, an empirical study was conducted in the field of cellulose biodegradation to compare the high-impact technical topics identified by the proposed method with those identified by the traditional method. Several experts with high academic influence and extensive experience in cellulose biodegradation research were invited to evaluate the high-impact technical topics identified in this study, thus validating the effectiveness of the proposed method. [Results/Conclusions] Compared with the traditional method, the technical topic influence measurement approach based on PhraseLDA-SNA and machine learning reveals more in-depth content. Moreover, this method also analyzes the importance and leading nature of technical topics, which shows superiority in quantitative analysis. Comparing the distribution of high-impact technical topic-related patents identified by the two methods across different years, the topics identified by the proposed method had a higher association ratio in the most recent data, indicating a significant reduction in the impact of patent data approval and citation delays.

Key words: topic mining, patent, influence measurement, machine learning, intellectual property, technology forecasting

中图分类号:  G353.1

引用本文

项芮, 孙巍. 基于PhraseLDA-SNA和机器学习的技术主题影响力测度方法研究[J]. 农业图书情报学报, 2024, 36(4): 45-62.

XIANG Rui, SUN Wei. Methodology for Assessing the Influence of Technical Topics Based on PhraseLDA-SNA and Machine Learning[J]. Journal of Library and Information Science in Agriculture, 2024, 36(4): 45-62.