中文    English

Journal of Library and Information Science in Agriculture ›› 2021, Vol. 33 ›› Issue (1): 41-52.doi: 10.13998/j.cnki.issn1002-1248.20-0307

• Research paper • Previous Articles     Next Articles

A COPRA Based Algorithm for Subject Division

TI Huiying, GENG Qian, JIN Jian*   

  1. School of Government, Beijing Normal University, Beijing 100875
  • Received:2020-04-23 Online:2021-01-05 Published:2021-02-05

Abstract: [Purpose/Significance] Online encyclopedias such as Wikipedia include a large number of concepts. However, in such encyclopedias, there are no clear divisions between concepts and concepts, between concepts and disciplines, and between disciplines and disciplines. It embarrasses junior researchers regarding a certain discipline to obtain domain relevant knowledge in a systematic and low-efficiency manner. [Method/Process] To obtain information in a specific subject area and organize knowledge better, a new algorithm is designed for subject division. Specifically, approaches in complex network analysis are introduced for subject division, which helps to build a topic text network by the classical topic model. Then, the overlapping community label propagation algorithm is improved to identify the boundaries of different subject divisions. [Results/Conclusions] Finally, 300 Wikipedia entry texts were investigated as samples to evaluate the effectiveness of the proposed algorithm. Categories of experiments were conducted to analyze the community structure of the entry network and the complexity of the subject division, which helps to provide a corpus to build a subject knowledge base.

Key words: subject division, complex network, overlapping community, topic model

CLC Number: 

  • G302
[1] 陈文勇. 情报学:学科还是领域[J]. 情报科学, 2007, 25(8): 1135-1140.
CHEN W Y.Informatics: A discipline or all area?[J]. Information science, 2007, 25(8): 1135-1140.
[2] 程军. 基于统计的文本分类技术研究[D]. 北京: 中国科学院研究生院, 2003.
CHENG J.Statistics-based text classification[D]. Beijing: A dissertation for the doctoral degree of management in the library of Chinese academy of science, 2003.
[3] PAVLINEK M, PODGORELEC V.Text classification method based on self-training and LDA topic models[J]. Expert systems withapplications, 2017, 28(80): 83-93.
[4] LILLEBERG J, ZHU Y, ZHANG Y.Support vector machines and word2vec for text classification with semantic features[C]//IEEE, 2015 IEEE 14th international conference on cognitive informatics & cognitive computing (ICCI*CC), Beijing: IEEE, 2015: 136-140.
[5] 邓三鸿, 傅余洋子, 王昊. 基于模型的中文图书多标签分类研究[J]. 数据分析与知识发现, 2017, 1(7): 52-60.
DENG S H, FU Y Y Z, WANG H. Multi-label classification of Chinese books with LSTM model[J]. Data analysis and knowledge discovery, 2017, 1(7): 52-60.
[6] 刘晋宏. 基于用户生成内容的多标签文本分类方法的研究与实现[D]. 北京: 北京邮电大学, 2018.
LIU J H.Research and implementation of multi-label text classification based on user generated content[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.
[7] 刘心惠, 陈文实, 周爱, 等. 基于LSTM模型的中文图书多标签分类研究[J]. 计算机工程与应用, 2020, 1(1): 1-10.
LIU X H, CHEN W S, ZHOU A, et al.Multi-label text classification based on joint model[J]. Computer engineering and applications, 2020, 1(1): 1-10.
[8] 曹一家, 陈晓刚, 孙可. 基于复杂网络理论的大型电力系统脆弱线路辨识[J]. 电力自动化设备, 2006, 13(12): 1-5.
CAO Y J, CHEN X G, SUN K.Identification of vulnerable lines in power gridbased on complex network theory[J]. Electric power automation equipment, 2006, 13(12): 1-5.
[9] 张彦超, 刘云, 张海峰, 等. 基于在线社交网络的信息传播模型[J]. 物理学报, 2011, 60(5): 66-72.
ZHANG Y C, LIU Y, ZHANG H F, et al.The research of information dissemination model on online social network[J]. Acta physica
10 sinica, 2011, 60(5): 66-72.
[10] 高自友, 赵小梅, 黄海军, 等. 复杂网络理论与城市交通系统复杂性问题的相关研究[J]. 交通运输系统工程与信息, 2006, 6(3): 41-47.
GAO Z Y, ZHAO X M, HUANG H J, et al.Research on problems related to complex networks and urban traffic systems[J]. Journal of transportation systems engineering and information technology, 2006, 6(3): 41-47.
[11] 段文奇, 刘宝全, 季建华. 国际贸易网络拓扑结构的演化[J]. 系统工程理论与实践, 2008, 28(10): 71-75.
DUAN W Q, LIU B Q, JI J H.Topological structure evolution of world trade network[J]. Systems engineering-theory & practice, 2008, 28(10): 71-75.
[12] 郭迟. 基于复杂网络的Internet脆弱性研究[D]. 武汉: 武汉大学, 2010.GUO C. Research on internet vulnerability based on complex networks theory[D]. Wuhan: Wuhan university, 2010.
[13] 赵辉, 刘怀亮, 范云杰. 复杂网络理论在中文文本特征选择中的应用研究[J]. 现代图书情报技术, 2012, 33(9): 23-28.
ZHAO H, LIU H L, FAN Y J.Study on the application of complex network theory in Chinese text feature selection[J]. Data analysis
14 and knowledge discovery, 2012, 33(9): 23-28.
[14] 尹丽英, 赵捧未. 基于语义网络社团划分的中文文本分类研究[J]. 图书情报工作, 2014, 58(19): 124-128.
YI L Y, ZHAO P W.A Chinese text classification algorithm based on partitioning community in semantic network[J]. Library and information service, 2014, 58(19): 124-128.
[15] LANDAUER T K, DUMAIS S T.A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge[J]. Psychological review, 1997, 104(2): 211-240.
[16] DEERWESTER S, DUMAIS S T, FURNAS G W, et al.Indexing by latent semantic analysis[J]. Journal of the American society for information science, 1990, 41(6): 391-407.
[17] BLEI D M, NG A Y, JORDAN M I, et al.Latent dirichlet allocation[J]. Journal of machine learning research, 2003, 3(1): 993-1022.
[18] 陈秀平. 基于Markov随机游走的渐进式半监督文本分类研究[D]. 南昌: 江西师范大学, 2012.
CHEN X P.Research on progressively semi-supervised text classification based on Markov random walk[D]. Nanchang: Jiangxi normal university, 2012.
[19] 付常雷. 一种基于NEWMAN快速算法改进的社团划分算法[J]. 计算机技术与发展, 2018, 28(1): 33-35.
FU C L.A community partitioning algorithm based on improved Fast-Newman algorithm[J]. Computer technology and development, 2018, 28(1): 33-35.
[20] 边青全. 基于动力学模型的网络社团检测算法研究[D]. 西安: 西安电子科技大学, 2018.
BIAN Q Q.Research on network community detection algorithm based on dynamic model[D]. Xi'an: Xidian university, 2018.
[21] KOTHARI R, JAIN V.Learning from labeled and unlabeled data[C]//IEEE proceedings of the 2002 international joint conference on neural networks, IJCNN' 02 (cat, No, 02CH37290), Honolulu, HI, USA: IEEE, 2002: 2803-2808.
[22] RAGHAVAN U N, ALBERT, RéKA, KUMARA S. Near linear time algorithm to detect community structures in large-scale networks[J]. Physical review e, 2007, 76(3): 036106-11.
[23] GREGORY, STEVE.Finding overlapping communities in networks by label propagation[J]. New journal of physics, 2010, 12(10): 103018-26.
[24] 饶仁杰. 改进的多标签传播算法在重叠社团挖掘中的研究[D]. 西安: 西安理工大学, 2018.
RAO R J.Research of improved multi-label propagation algorithm used in overlapping community detection[D]. Xi'an: Xi'an university of technology, 2018.
[25] 杜长江, 王志晓, 邢贞明. 基于多标签传播的重叠社区发现优化算法[J]. 数据采集与处理, 2018, 33(2): 288-298.
DU C J, WANG Z X, XING Z M.Overlapping community detection algorithm based on improved Multi-Label propagation[J]. Journal of data acquisition and processing, 2018, 33(2): 288-298.
[1] WANG Weiwei, HUA Bolin. Extraction and Mining of Intelligent Description Information of Public Culture [J]. Journal of Library and Information Science in Agriculture, 2021, 33(8): 13-23.
[2] ZHAO Lei, ZHANG Chengzhi. Difference Analysis of Research Topics in a Specific Domain Based on Different Content Levels [J]. Journal of Library and Information Science in Agriculture, 2021, 33(5): 14-27.
[3] HUANG Shan, PU Hongyu, MA Jie. Collaborative Effect Measurement of Intelligent Government Information Based on Complex Network [J]. Journal of Library and Information Science in Agriculture, 2020, 32(5): 19-30.
[4] YE Chun-lei, NING Lu. Research Trends Analysis of Modern Agricultural Science and Technology During Recent 10 Years in China Based on Bibliometric and Topic Model [J]. , 2016, 28(10): 77-82.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!