中文    English

Agricultural Library and Information ›› 2019, Vol. 31 ›› Issue (4): 19-28.doi: 10.13998/j.cnki.issn1002-1248.2019.03.19-0342

• Research paper • Previous Articles     Next Articles

Research of Topics Discovery and Tech Evolution Based on Text Preprocessed LDA Model

WANG Li1,2, SHEN Xiang1,2   

  1. 1.National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
    2.Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
  • Received:2019-04-24 Online:2019-04-05 Published:2019-06-21

Abstract: [Objective] Computational science and Data Science are inspiring the intelligent analysis and information service today. Machine learning text analysis methods is changing the traditional analysis methods. This article discuss the benefits of unsupervised learning approaches in patent text mining. [Methods] Patent data of SiC industry were preprocessed by filter model based on NLTK Toolkit to identify the tech terms and then clustered based on Latent Dirichlet Allocation model to find the latent topics which were visualized. Based on group operation Top terms ranked by tf-idf through every year were used to reveal the R&D focus evolution. [Results] This research offers a demonstration of the proposed method based on 43,621 SiC patents. The results show 28 Research and Development topics with tech terms in SiC industry and present a Research and Development focus evolution based new emerging terms of every year which provides a clue for more detail analyses later. Finally,we discuss the clues for the R&D focus in the SiC industry.[Limitation]Multi Topics for documents were not compared for the R&D focus evolution in this article. That will be discussed in future. [Conclusions]The results show a efficent way to find technology focus evolution from a large scale text data.

Key words: LDA model, tech evolution, preprocessed text, visualization, automatic term identification

CLC Number: 

  • TP393
[1] Alghamdi, R, Alfalqi K. A Survey of Topic Modeling in Text Mining[J]. International Journal of Advanced Computer Science and Applications, 2015, 6(1): 9-27.
[2] van Eck NJ, Waltman L, Noyons ECM, et al. Automatic Term Identification for Bibliometric Mapping[J]. Scientometrics, 2010, 82: 581-569.
[3] Didier B.Surface Grammatical Analysis for The Extraction of Terminological Noun Phrases[C].
Proceeding COLING '92 Proceedings ofthe 14th conference on Computational linguistics, 1992, 3:977-981.
[4] 王博, 刘盛博, 丁堃, 刘则渊. 基于LDA 主题模型的专利内容分析方法[J]. 科研管理,2015, 36(3):111-117.
(Wang Bo, Liu Shengbo, Ding Kun, Liu Zeyuan.Patent content analysis method based on LDA topic model[J]. Science Research Management, 2015, 100: 317-329.)
[5] Justeson,J.S., Katz,S.M.Technical Terminology: Some Linguistic Properties and An Algorithm for Identification in Text[J]. Natural
Language Engineering, 1995, 1(1): 9-27.
[6] Thomas L G, Mark S.Finding scientific topics[J]. PNAS, 2004, 101(1): 5228-5235.
[7] Donghyun Choi, Bomi Song.Exploring Technological Trends in Logistics: Topic Modeling-Based Patent Analysis[J]. Sustainability, 2018, 10(8): 2810-2835.
[8] 宫小翠, 安新颖. 基于LDA 模型的医学领域主题分裂融合探测[J]. 图书情报工作, 2017, 61(18): 76-83.
(Gong Xiaocui, An Xinying.A Research of Topic Splitting and Merging Detecting in the Medical Field Based on the LDA Model[J]. Library and Information Service, 2017, 61(18): 64-74.
[9] 曲佳彬, 欧石燕. 基于主题过滤与主题关联的学科主题演化分析[J]. 数据分析与知识发现, 2018, 2(1): 64-75.
(Jiabin Qu,Shiyan Ou.Analyzing Topic Evolution with Topic Filtering and Relevance. Data Analysis and Knowledge Discovery[J]. Data Analysis and Knowledge Discovery, 2018, 2(1): 64-75.)
[10] Jacob P.Python Text Processing with NLTK 2.0 Cookbook[M]. UK.: Packt Publishing Ltd., 2010.
[11] 王丽, 邹丽雪, 刘细文. 基于LDA主题模型的文献关联分析及可视化研究[J]. 数据分析与知识发现, 2018, 2(3): 98-107.
(Wang Li, Zou Lixue, Liu Xiwen.Visualizing Document Correlation Based on LDA Model[J]. Data Analysis and Knowledge Discovery, 2018, 2(3): 98-107.)
[12] Blei David M., Ng Andrew Y., Jordan, Michael I. Lafferty, John.Latent Dirichlet allocation[J]. Journal of Machine Learning Research. January 2003, 3: 993-1022.
[1] LI Lanfang, CHEN Yunwei, ZHANG Xue, DENG Yong. Research and Application of Spatial Scientometrics [J]. Journal of Library and Information Science in Agriculture, 2022, 34(7): 27-38.
[2] XING Yunfei, LI Yuhai. Visualization of Topic Graph of Weibo Public Opinion Based on Text Mining [J]. Journal of Library and Information Science in Agriculture, 2021, 33(7): 12-23.
[3] WANG Feifei, HAN Wenfei, SU Ziyao, YI Xinyue. Exploring the Academic Exchange among Countries along the "The Belt and Road": Bibliometrics Perspective of Highly Cited Papers [J]. Journal of Library and Information Science in Agriculture, 2021, 33(6): 94-106.
[4] XU Yongle, CHEN Yuanyuan, YANG Tingting, WAN Xiangli. Comparative Analysis of the Research on the Influence of Chinese and International Think Tanks [J]. Journal of Library and Information Science in Agriculture, 2021, 33(11): 50-62.
[5] WANG Yihan, YE Yuming. Review on the Research of Open Science at Home and Abroad in Recent Ten Years [J]. Journal of Library and Information Science in Agriculture, 2021, 33(10): 20-35.
[6] PENG Xia, LIU Min, YANG Li, FAN Shan. The Geographical Distribution and Causes of the Female Writers during Ming and Qing Dynasties in Spatial Horizon: A Case Study of Songjiang Prefecture [J]. Journal of Library and Information Science in Agriculture, 2020, 32(9): 31-38.
[7] LIN Hai, GU Tinghua, WU Yubing. Development Context and Characteristics of Social Commerce: Review and Prospect Based on Visualization Technology [J]. Journal of Library and Information Science in Agriculture, 2020, 32(5): 31-44.
[8] CAO Qi. Visual Modeling of Keyword Dimension Reduction in Double First-Class University Funds Based on t-SNE Algorithm [J]. Journal of Library and Information Science in Agriculture, 2020, 32(2): 47-57.
[9] LIU Feng, LI Xianglan, NAN Hong, TANG Yan. Bibliometric and Visual Analysis on Soybean Breeding for Pest Resistance in China [J]. , 2018, 30(5): 70-74.
[10] LIU Zhongkai. Visualized Analysis on Domestic Library Wechat Research [J]. , 2018, 30(3): 71-75.
[11] ZHAO Laijuan, CHEN Wei, ZHANG Ning. Visualized Analysis on Characteristic Database Research in China's University [J]. , 2018, 30(2): 110-113.
[12] ZHANG Yang, YU Yanhui, HUANG Danqing. Comparison between High Attention Papers and ESI Hot Papers: A Case Study of Clinical Medicine [J]. , 2018, 30(10): 13-22.
[13] ZHANG Yang, WANG Yuanyuan. A Quantitative Analysis of Scientific Research Evaluation at Home and Abroad [J]. , 2018, 30(10): 38-48.
[14] AZIGULI Wusiman, ZULIHUMAER Aizize. Discussion on the Application of Knowledge Mapping in Library’s Reference and Subject Service [J]. , 2017, 29(8): 181-185.
[15] QI Xuelong, TAO Jihan, TANG Yan, LI Bo, WANG Lei. Bibliometric and Visual Analysis on the Research of Precision Agriculture in China [J]. , 2017, 29(8): 62-65.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!