农业图书情报学报 ›› 2025, Vol. 37 ›› Issue (3): 4-17.doi: 10.13998/j.cnki.issn1002-1248.25-0218

• 特约文章 •    下一篇

DeepSeek赋能领域知识图谱低成本构建研究

史忠艳1, 雷洁1, 孙坦1,2, 赵瑞雪1,3, 李娇1, 黄永文1, 鲜国建1,2()   

  1. 1.中国农业科学院农业信息研究所,北京 100081
    2.农业农村部 农业大数据重点实验室,北京 100081
    3.国家新闻出版署 农业融合出版知识挖掘与知识服务重点实验室,北京 100081
  • 收稿日期:2025-01-22 出版日期:2025-03-05 发布日期:2025-06-10
  • 通讯作者: 鲜国建 E-mail:xianguojian@caas.cn
  • 作者简介:史忠艳,硕士研究生,研究方向为知识图谱
    雷洁,博士,助理研究员,研究方向为信息资源管理、知识组织
    孙坦,博士,研究馆员(二级),研究方向为数字信息描述与组织
    赵瑞雪,博士,研究员,研究方向为农业信息管理系统
    李娇,博士,副研究员,研究方向为知识图谱与知识服务
    黄永文,博士,研究员,研究方向为知识组织与知识服务
  • 基金资助:
    国家社会科学基金一般项目“多模态科技资源的语义组织与关联发现服务研究”(22BTQ079);中国科协青年人才托举工程项目“面向科研论文的科学论证语义识别与解析研究”(2022QNRC001)

Research on DeepSeek-Empowered Low-Cost Construction of Domain-Specific Knowledge Graphs

SHI Zhongyan1, LEI Jie1, SUN Tan1,2, ZHAO Ruixue1,3, LI Jiao1, HUANG Yongwen1, XIAN Guojian1,2()   

  1. 1.Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081
    2.Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081
    3.Key Laboratory of Knowledge Mining and Knowledge Services in Agricultural Converging Publishing, National Press and Publication Administration, Beijing 100081
  • Received:2025-01-22 Online:2025-03-05 Published:2025-06-10
  • Contact: XIAN Guojian E-mail:xianguojian@caas.cn

摘要:

【目的/意义】 在以DeepSeek为代表的开源大语言模型驱动知识工程范式变革的背景下,本研究针对传统领域知识图谱构建中存在的专家规则依赖度高、人工标注成本大、多源数据处理效率低等瓶颈问题,提出基于DeepSeek的领域知识图谱低成本构建方法。 【方法/过程】 通过构建本体建模、数据融合、智能抽取的方法框架,基于领域认知特征设计本体模型,构建多源异构数据融合方法实现数据结构统一表征,创新性地将DeepSeek与知识抽取相结合,提出语义理解增强、提示工程的领域知识抽取技术体系。 【结果/结论】 以生猪全产业链领域知识图谱构建为实证对象,定义产业链结构、21类核心实体及其属性关系,实现面向智慧养殖的生猪产业知识建模。实验表明,DeepSeek-R1在零样本学习条件下,对生猪疫病防治场景的实体识别F1值达0.92。本研究为领域知识图谱构建提供了“机器初筛——人工精校”协同范式,验证了大语言模型在垂直领域的知识抽取潜力,对推动DeepSeek赋能知识图谱低成本构建具有研究价值与实践参考。

关键词: DeepSeek, 知识抽取, 知识图谱, 零样本, 知识底座, 生猪, 全产业链

Abstract:

[Purpose/Significance] In recent years, large language models (LLMs) have achieved revolutionary breakthroughs in semantic understanding and generation capabilities through massive text pre-training. This has injected brand-new impetus into the field of knowledge engineering. As a structured knowledge carrier, the knowledge graph has unique advantages in integrating heterogeneous data from multiple sources and constructing an industrial knowledge system. In the context of a paradigm shift in the field of knowledge engineering driven by the emergence of open-source LLMs such as DeepSeek, this study proposes a cost-effective method for constructing domain knowledge graphs based on DeepSeek. We aim to address the limitations of traditional domain knowledge graphs, such as high dependence on expert rules, the high cost of manual annotation, and inefficient processing of multi-source data. [Method/Process] We proposed the semantic understanding-enhanced, cue-engineered domain knowledge extraction technology system, constructed on the methodological framework of manually constructing ontology modelling. In order to process the acquired data, the ETL\MinerU and other tools were used, and the DeepSeek-R1application programming interface was then invoked for intelligent extraction. The ontology model was designed based on domain cognitive features and the multi-source heterogeneous data fusion method was used to achieve the unified characterization of the data structure. Furthermore, the DeepSeek and knowledge extraction were combined. Our system provides a cost-effective reusable technical paradigm for constructing domain knowledge graphs, as well as efficient knowledge extraction, leveraging the advanced powerful textual reasoning ability of the DeepSeek model. [Results/Conclusions] In this study, we take the construction of a domain knowledge map of the entire pig industrial chain as an empirical object. We define the structure of the industrial chain, identify 21 types of core entities and describe their attribute relationships. We achieve the knowledge modelling of the pig industry with a focus on smart farming. The methodology developed in this research was also employed to process and extract knowledge from online and offline resource data. Preliminary experiments demonstrate that DeepSeek-R1 exhibits an F1 value of 0.92 when recognizing the attributes of 161 diseases and 11 types of entities in pig disease control scenarios under zero-sample learning conditions. These experiments also ascertain the reusability of the methodology for other links in the chain. Concurrently, the constructed knowledge map of the entire industrial chain of pigs will be utilized for the design and validation of intelligent application scenarios, with the objective of promoting the intelligent information processing in the pig industry. This study proposes a synergistic paradigm for constructing domain knowledge graphs using DeepSeek, a method that combines deep learning with manual calibration for efficient knowledge extraction and ensure accuracy. This approach ensures the efficiency of knowledge extraction and verifies the knowledge extraction potential of LLMs in vertical domains. The study's findings contribute to the extant literature and offer a practical reference for the promotion of DeepSeek-enabled cost-effective construction of knowledge graphs.

Key words: DeepSeek, knowledge extraction, knowledge graph, zero-shot learning, knowledge foundation, swine, whole industry chain

中图分类号:  G254.9

引用本文

史忠艳, 雷洁, 孙坦, 赵瑞雪, 李娇, 黄永文, 鲜国建. DeepSeek赋能领域知识图谱低成本构建研究[J]. 农业图书情报学报, 2025, 37(3): 4-17.

SHI Zhongyan, LEI Jie, SUN Tan, ZHAO Ruixue, LI Jiao, HUANG Yongwen, XIAN Guojian. Research on DeepSeek-Empowered Low-Cost Construction of Domain-Specific Knowledge Graphs[J]. Journal of library and information science in agriculture, 2025, 37(3): 4-17.