农业图书情报学报 ›› 2021, Vol. 33 ›› Issue (1): 41-52.doi: 10.13998/j.cnki.issn1002-1248.20-0307

• 研究论文 • 上一篇    下一篇

一种基于重叠社区标签传播的学科划分方法

遆慧颖, 耿骞, 靳健*   

  1. 北京师范大学 政府管理学院,北京 100875
  • 收稿日期:2020-04-23 出版日期:2021-01-05 发布日期:2021-02-05
  • 通讯作者: *靳健(ORCID:0000-0002-3239-22942),副教授,博士。Email:jinjian.jay@bnu.edu.cn
  • 作者简介:遆慧颖(ORCID:0000-0003-2885-7252),女,本科,北京师范大学政府管理学院。耿骞(ORCID:0000-0001-5064-49962),教授,博士。
  • 基金资助:
    国家自然科学基金项目“差异化客户需求的提取及比较研究:基于产品在线评论的挖掘分析”(71701019)

A COPRA Based Algorithm for Subject Division

TI Huiying, GENG Qian, JIN Jian*   

  1. School of Government, Beijing Normal University, Beijing 100875
  • Received:2020-04-23 Online:2021-01-05 Published:2021-02-05

摘要: [目的/意义]以Wikipedia为代表的网络百科全书收录了海量的概念。但在此类百科全书中,概念与概念之间、概念与学科之间以及学科与学科之间的关系缺乏明确的划分。这使得初步接触某一学科的查询者很难高效地系统性地获取该领域相关知识。[方法/过程]为获取某特定学科领域信息,更好地组织知识,本研究提出一种用于对各学科边界进行划分的方法。本研究将复杂网络分析方法引入文本主题划分领域,利用主题模型构建了主题文本网络,并在此基础上改进了重叠社区标签传播算法,实现学科领域的边界划分。[结果/结论]本研究以300个Wikipedia词条文本为样本,证明了该方法的有效性。通过多组实验,分析了词条网络的相关社团结构以及学科领域的复杂性,为构建学科构建领域知识库奠定基础。

关键词: 学科领域, 复杂网络, 重叠社团, 主题模型

Abstract: [Purpose/Significance] Online encyclopedias such as Wikipedia include a large number of concepts. However, in such encyclopedias, there are no clear divisions between concepts and concepts, between concepts and disciplines, and between disciplines and disciplines. It embarrasses junior researchers regarding a certain discipline to obtain domain relevant knowledge in a systematic and low-efficiency manner. [Method/Process] To obtain information in a specific subject area and organize knowledge better, a new algorithm is designed for subject division. Specifically, approaches in complex network analysis are introduced for subject division, which helps to build a topic text network by the classical topic model. Then, the overlapping community label propagation algorithm is improved to identify the boundaries of different subject divisions. [Results/Conclusions] Finally, 300 Wikipedia entry texts were investigated as samples to evaluate the effectiveness of the proposed algorithm. Categories of experiments were conducted to analyze the community structure of the entry network and the complexity of the subject division, which helps to provide a corpus to build a subject knowledge base.

Key words: subject division, complex network, overlapping community, topic model

中图分类号: 

  • G302

引用本文

遆慧颖, 耿骞, 靳健. 一种基于重叠社区标签传播的学科划分方法[J]. 农业图书情报学报, 2021, 33(1): 41-52.

TI Huiying, GENG Qian, JIN Jian. A COPRA Based Algorithm for Subject Division[J]. Journal of Library and Information Science in Agriculture, 2021, 33(1): 41-52.