中文    English

Agricultural Library and Information ›› 2019, Vol. 31 ›› Issue (6): 21-30.doi: 10.13998/j.cnki.issn1002-1248.2019.06.19-0550

• Research paper • Previous Articles     Next Articles

Topical Analysis of Scientific and Technical Reports based on Topical N-Grams Model

AN Xin1, XU Shuo2   

  1. 1. School of Economics and Management, Beijing Forestry University, Beijing 100083, China;
    2. Research Base of Beijing Modern Manufacturing Development, College of Economics and Management, Beijing University of Technology, Beijing 100124, Chin
  • Received:2019-06-24 Online:2019-06-05 Published:2019-08-02

Abstract: As one of the important carriers of scientific & technical (S&T) intelligence, S&T reports can reflect the line of S&T development, recover the latest news of S&T fronts, and even insight the trends of S&T development. Researches on developing and utilizing S&T reports in our country mainly focus on the following: publication and distribution of S&T reports in the form of book and electrical publication; database construction; service mode; intelligent property and so on. The deep data mining on S&T reports remains largely under-studied. This work tries to discover the domain latent topics of S&T reports with the topical n-grams model. In order to determine the number of topics of S&T reports for some specific domain, the calculation method of perplexity of the topic n-grams model is put forward with the dynamic programming in this study. Finally, 70 domain topics are discovered from 1 344 S&T reports in the tumor domain, such as "molecular mechanisms/tumor cells", "system biology/key methods" and so on. Experimental results show that it is feasible and efficient to discover the latent topics from S&T reports with the topical n-grams model.

Key words: scientific and technical reports, topical n-grams model, topical analysis, perplexity, heat map

CLC Number: 

  • G322
[1] 贺德方. 中国科技报告制度的建设方略[J]. 情报学报, 2013, 32(5): 452-458.
[2] 张爱霞, 沈玉兰. 美国政府科技报告体系建设现状分析[J]. 情报学报, 2007, 26(4), 496-502.
[3] 赵俊杰. 美国科技报告体系建设概况[J]. 全球科技经济瞭望, 2013, 28(3): 1-7.
[4] 张爱霞. 美国能源部科技报告管理和服务现状分析[J]. 图书情报工作, 2007, 51(1): 89-92.
[5] 吴蓉, 顾立平, 曾燕. 英国科技报告制度调研与分析——支持科技报告存储与传播的政策环境[J]. 图书情报工作, 2015, 59(21):76-82.
[6] 贺德方, 胡红亮, 周杰. 中国科技报告体系的建设模式研究[J]. 情报学报, 2009, 28(6): 804-808.
[7] 张铣清. 对发展中国科技报告工作的探讨[J]. 中国科技论坛, 1995, (6): 35-38.
[8] 胡红亮. 建立中国科技报告体系势在必行[J]. 全球科技经济瞭望, 2007, (2): 33-35.
[9] 楚明超. 美国NTIS介绍及对我国科技报告制度建设的启发[J]. 科技成果管理与研究, 2013, (8): 32-34.
[10] 周杰. 科技报告资源的构成及产生机理研究[J]. 情报学报, 2013, 32(5): 466-471.
[11] 张新民. 我国科技报告制度体系框架设计研究与实施进展[J]. 中国科技资源导刊, 2013, (3): 1-6, 40:
[12] 曾建勋. 科技报告技术标准体系研究[J]. 情报学报, 2013, 32(5): 459-465.
[13] 侯人华. 科技报告政策体系及服务方式研究[J]. 情报学报, 2013, 32(5): 472-477.
[14] 陈传夫, 代钰珠, 曾建勋. 科技报告开发利用与知识产权问题研究[J]. 情报学报, 33(8): 793-799.
[15] 贺德方. 科技报告资源体系研究[J]. 信息资源管理学报, 2013, (1): 4-9, 31.
[16] 夏立新, 李成龙. 基于关联数据的科技报告语义共享框架设计与实现[J]. 数字图书馆论坛, 2015, (9): 1-9.
[17] 石军. 互联网科技报告的检索和开发利用[J]. 内蒙古科技与经济, 2002, 1: 56-57.
[18] Blei D, Ng A and Jordan M. Latent Dirichlet Allocation[J]. Journal of Machine Learning Re-search, 2003, 3: 993-1022.
[19] Griffths T and Steyvers M. Finding scientific topics[C]. // Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Suppl. 1):5228-5235.
[20] Xu S, Zhu L, Qiao X et al. Topic Linkages between Papers and Patents[C] // Proceedings of the 4th AST International Conference on Advanced Science and Technology, Daejeon: SERSC press, 2012:176-183.
[21] 史庆伟, 乔晓东, 徐硕等. 作者主题演化模型及其在研究兴趣演化分析中的应用[J]. 情报学报, 2013, 32(9): 912-919.
[22] Xu S, Shi Q, Qiao X, et al.Author-Topic over Time (AToT): A Dynamic Users’ Interest Model[C]. // Proceedings of the 5th International Conference on Mobile, Ubiquitous, and Intelligent Computing, 2014: 239-245.
[23] Xu S, Shi Q, Qiao X, et al.A Dynamic Users’ Interest Discovery Model with Distributed Inference Algorithm[J]. International Journal of Distributed Sensor Networks, 2014, 280892: 1-11.
[24] An X, Xu S, Wen Y, et al.A Shared Interest Discovery Model for Coauthor Relationship in SNS[J]. International Journal of Distributed Sensor Networks, 2014, 2014(820715): 1-9.
[25] Xu S, Liu J, Zhai D, et al.Overlapping Thematic Structures Extraction with Mixed-Membership Stochastic Blockmodel[J]. Scientometrics, 2018, 117(1): 61-84.
[26] Xu S, Zhai D, Wang F, et al.A Novel Method for Topic Linkages between Scientific Publications and Patents[J]. Journal of the Association for Information Science and Technology. DOI: 10.1002/asi.24175
[27] Zhang H, Xu S, Qiao X, et al.Infinite Coauthor Topic Model (Infinite coAT): A Non-Parametric Generalization for coAT Model[C]. // Proceedings of the 1st International Workshop on Patent Mining and its Applications (IPaMin), 2014.
[28] Tang J, Wang B, Yang Y, et al.PatentMiner: Topic-Driven Patent Analysis and Mining[C] // Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, NY: ACM, 2012: 1366-1374.
[29] Newman D, Chemudugunta C,Smyth P.Statistical Entity-Topic Models[C] // Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, NY: ACM, 2006: 680-686.
[30] Ni X, Sun J-T, Hu J, et al.Mining Multilingual Topics from Wikipedia[C] // Proceedings of the 18th International Conference on World Wide Web, NY: ACM, 2009: 1155-1156.
[31] Tsai F S.A Tag-Topic Model for Blog Mining[J]. Expert System with Applications, 38(5): 5330-5335.
[32] Blei D M.Introduction to Probabilistic Topic Models[J]. Communications of the ACM, 2012, 55(4): 77-84.
[33] 张晗, 徐硕, 乔晓东. 融合科技文献内外部特征的主题模型发展综述[J]. 情报学报, 2014, 33(10): 1108-1120.
[34] 徐戈, 王厚峰. 自然语言处理中主题模型的发展[J]. 计算机学报, 2011, 34(8): 1423-1436.
[35] Daud A, Li J, Zhou L, et al.Knowledge Discovery through Directed Probabilistic Topic Models: A Survey[J]. Frontiers of Computer Science in China, 2010, 4(2): 280-301.
[36] Griffiths T L, Steyvers M, Blei D M, et al.Integrating Topics and Syntax [C] // Advances in Neural Information Processing System 17, MA: MIT Press, 2005: 537-544.
[37] Rabiner L R.A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[J]. Proceedings of the IEEE, 1989, 77(2): 257-286.
[38] 徐硕. 基于机器学习方法的硒蛋白基因预测[D]. 北京:中国农业大学, 2008.
[39] Wallach H M.Topic Modeling: Beyond Bag-of-Words [C]. // Proceedings of the 23rd International Conference on Machine Learning, New York: ACM Press, 2006: 977-984.
[40] Griffiths T L, Steyvers M,Tenenbaum J B.Topics in Semantic Representation[J]. Psychological Review, 2007, 114(2): 211-244.
[41] Wang X, McCallum A and Wei X. Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval [C]. // Proceedings of the 7th International Conference on Data Mining, New York: ACM Press, 2007: 697-702.
[42] Wang Z, Xu S,Zhu L.Semantic Relation Extraction Aware of N-Gram Features from Unstructured Biomedical Text[J]. Journal of Biomedical Informatics, 2018, 86: 59-70.
[43] Mann G S, Mimno D,McCallum A. Bibliometric Impact Measures Leveraging Topic Analysis [C]. // Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, 2006: 65-74. [44] Teh Y, Newman D and Welling M. A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation[J]. Advances in neural information processing systems 2007, 19: 1353-1360.
[45] Azzopardi L, Girolami M, van Risjbergen K, et al. Investigating the relationship between language model perplexity and IR precision-recall measures[C]. // Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, New
[1] WANG Yaqin, BAI Caijin. Strategic Research on the Construction of "Double First-Class" University Think Tanks under the Background of Precision Poverty Alleviation [J]. Journal of Library and Information Science in Agriculture, 2020, 32(12): 70-76.
[2] WANG Dan, SUN Yang, XIE Hui, ZHANG Liping, WAN Feng, WANG Yanjun. Network Security Systems of Agricultural Research Institutions Based on Hierarchical Protection 2.0: Taking the Chinese Academy of Agricultural Sciences as an Example [J]. Journal of Library and Information Science in Agriculture, 2020, 32(12): 97-103.
[3] ZHNAG Qiong, GUO Wen-chao, WANG Fang, LONG Xuan-qi. Strengthen the Society Function, the Promotion Society Service Innovation Ability of Science and Technology Research [J]. , 2015, 27(4): 154-157.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!