中文    English

Journal of library and information science in agriculture ›› 2025, Vol. 37 ›› Issue (3): 4-17.doi: 10.13998/j.cnki.issn1002-1248.25-0218

    Next Articles

Research on DeepSeek-Empowered Low-Cost Construction of Domain-Specific Knowledge Graphs

SHI Zhongyan1, LEI Jie1, SUN Tan1,2, ZHAO Ruixue1,3, LI Jiao1, HUANG Yongwen1, XIAN Guojian1,2()   

  1. 1.Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081
    2.Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081
    3.Key Laboratory of Knowledge Mining and Knowledge Services in Agricultural Converging Publishing, National Press and Publication Administration, Beijing 100081
  • Received:2025-01-22 Online:2025-03-05 Published:2025-06-10
  • Contact: XIAN Guojian E-mail:xianguojian@caas.cn

Abstract:

[Purpose/Significance] In recent years, large language models (LLMs) have achieved revolutionary breakthroughs in semantic understanding and generation capabilities through massive text pre-training. This has injected brand-new impetus into the field of knowledge engineering. As a structured knowledge carrier, the knowledge graph has unique advantages in integrating heterogeneous data from multiple sources and constructing an industrial knowledge system. In the context of a paradigm shift in the field of knowledge engineering driven by the emergence of open-source LLMs such as DeepSeek, this study proposes a cost-effective method for constructing domain knowledge graphs based on DeepSeek. We aim to address the limitations of traditional domain knowledge graphs, such as high dependence on expert rules, the high cost of manual annotation, and inefficient processing of multi-source data. [Method/Process] We proposed the semantic understanding-enhanced, cue-engineered domain knowledge extraction technology system, constructed on the methodological framework of manually constructing ontology modelling. In order to process the acquired data, the ETL\MinerU and other tools were used, and the DeepSeek-R1application programming interface was then invoked for intelligent extraction. The ontology model was designed based on domain cognitive features and the multi-source heterogeneous data fusion method was used to achieve the unified characterization of the data structure. Furthermore, the DeepSeek and knowledge extraction were combined. Our system provides a cost-effective reusable technical paradigm for constructing domain knowledge graphs, as well as efficient knowledge extraction, leveraging the advanced powerful textual reasoning ability of the DeepSeek model. [Results/Conclusions] In this study, we take the construction of a domain knowledge map of the entire pig industrial chain as an empirical object. We define the structure of the industrial chain, identify 21 types of core entities and describe their attribute relationships. We achieve the knowledge modelling of the pig industry with a focus on smart farming. The methodology developed in this research was also employed to process and extract knowledge from online and offline resource data. Preliminary experiments demonstrate that DeepSeek-R1 exhibits an F1 value of 0.92 when recognizing the attributes of 161 diseases and 11 types of entities in pig disease control scenarios under zero-sample learning conditions. These experiments also ascertain the reusability of the methodology for other links in the chain. Concurrently, the constructed knowledge map of the entire industrial chain of pigs will be utilized for the design and validation of intelligent application scenarios, with the objective of promoting the intelligent information processing in the pig industry. This study proposes a synergistic paradigm for constructing domain knowledge graphs using DeepSeek, a method that combines deep learning with manual calibration for efficient knowledge extraction and ensure accuracy. This approach ensures the efficiency of knowledge extraction and verifies the knowledge extraction potential of LLMs in vertical domains. The study's findings contribute to the extant literature and offer a practical reference for the promotion of DeepSeek-enabled cost-effective construction of knowledge graphs.

Key words: DeepSeek, knowledge extraction, knowledge graph, zero-shot learning, knowledge foundation, swine, whole industry chain

CLC Number: 

  • G254.9

Fig.1

Knowledge graph construction workflow based on DeepSeek-R1"

Table 1

Prompt frames"

框架介绍关键字段
RTF框架最简单的入门框架,适用于通用任务,快速问答、信息查询等

角色(ROLE):指定大模型角色,明确专业背景和承担角色

任务(TASK):定义具体任务或要解决的问题

格式(FORMAT):指定输出格式

ROSES框架将交互细分为5个核心部分,进行目的明确的交流,较RTF框架细化了其任务描述部分,适合需要明确角色和目标的交互,强调场景和解决方案,如咨询服务、问题解决等

角色(Role):指定大模型的角色

目标(Objective):描述要实现的目标或想要大模型完成的任务

场景(Scenario):提供与请求相关的背景信息或上下文

预期解决方案(Expected Solution):定义期望的结果

步骤(Steps):询问实现解决方案所需的具体步骤或操作

SAGE框架用于明确优化与人工智能模型的交互工程,适用于需要详细情况和行动的复杂任务

情况(Situation):描述任务执行的上下文或背景

行动(Action):明确需要进行的操作或步骤

目标(Goal):指出任务完成后应达到的目的或效果

预期 (Expectation):对输出结果的具体要求,包括格式、时间限制等

CoT模式CoT模式称为思维链模式,让大模型逐步参与将一个复杂问题分解为一步一步的子问题并依次进行求解的过程可以显著提升大模型的性能

指令(Instruction):用于描述问题并且告知大模型的输出格式

逻辑依据(Rationale):CoT的中间推理过程,可以包含问题的解决方案、中间推理步骤以及与问题相关的任何外部知识

示例(Exemplars):以少样本的方式为大模型提供输入输出对的基本格式

CoD模式由Salesforce、麻省理工学院和哥伦比亚大学的研究人员推出的一种提示方法,使用递归的方式来创建越来越好的输出提示,生成的文章摘要更加密集且适合理解。适用于总结性、长输出格式内容场景

指令(Instruction):明确大语言模型进行的任务和目的

步骤(Steps):设置执行任务步骤,并定义相关实体

指南(Guide):确定输出细节以及格式

Fig.2

​DeepSeek and prompt engineering framework diagram"

Fig.3

​Ontology model framework for the entire swine industry chain"

Fig.4

DeepSeek knowledge extraction results (partial)"

Table 2

​Field-wise recognition and extraction results"

字段名称PRF1
疾病类别0.990.960.97
中文名称1.000.960.98
中文别名1.000.970.99
英文名称1.000.950.97
病原0.720.950.82
病原类别0.610.940.74
症状0.930.950.94
传播途径0.970.960.97
易感时期0.930.950.94
治疗措施0.770.940.85
预防措施0.950.950.95

Fig.5

Pivot chart of identification and extraction results by field"

Fig.6

​Disease node: porcine enzootic pneumonia"

Fig.7

​Multi-symptom correlation for disease diagnosis"

Fig.8

Enquiry about cold season contact transmission diseases"

1 秦小林, 古徐, 李弟诚, 等. 大语言模型综述与展望[J]. 计算机应用, 2025, 45(3): 685-696.
QIN X L, GU X, LI D C, et al. Survey and prospect of large language models[J]. Journal of computer applications, 2025, 45(3): 685-696.
2 王萌, 王昊奋, 李博涵, 等. 新一代知识图谱关键技术综述[J]. 计算机研究与发展, 2022, 59(9): 1947-1965.
WANG M, WANG H F, LI B H, et al. Survey on key technologies of new generation knowledge graph[J]. Journal of computer research and development, 2022, 59(9): 1947-1965.
3 徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016, 45(4): 589-606.
XU Z L, SHENG Y P, HE L R, et al. Review on knowledge graph techniques[J]. Journal of university of electronic science and technology of China, 2016, 45(4): 589-606.
4 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600.
LIU Q, LI Y, DUAN H, et al. Knowledge graph construction techniques[J]. Journal of computer research and development, 2016, 53(3): 582-600.
5 车万翔, 窦志成, 冯岩松, 等. 大模型时代的自然语言处理: 挑战、机遇与发展[J]. 中国科学: 信息科学, 2023, 53(9): 1645-1687.
CHE W X, DOU Z C, FENG Y S, et al. Towards a comprehensive understanding of the impact of large language models on natural language processing: Challenges, opportunities and future directions[J]. Scientia sinica (informationis), 2023, 53(9): 1645-1687.
6 XU D R, LI X H, ZHANG Z H, et al. Harnessing large language models for knowledge graph question answering via adaptive multi-aspect retrieval-augmentation[J/OL]. arXiv, 2024. .
7 李晓理, 刘春芳, 耿劭坤. 知识图谱与大语言模型协同共生模式及其教育应用综述[J/OL]. 计算机工程与应用, 2025: 1-15. .
LI X L, LIU C F, GENG S K. A survey of the collaborative symbiosis mode between knowledge graph and large language model and its education application[J/OL]. Computer engineering and applications, 2025: 1-15. .
8 韩普, 陈文祺, 叶东宇. 面向中文电子病历的多模态知识图谱构建方法研究[J]. 图书情报工作, 2024, 68(23): 30-40.
HAN P, CHEN W Q, YE D Y. Research on multimodal knowledge graph construction method for Chinese electronic medical record[J]. Library and information service, 2024, 68(23): 30-40.
9 毛瑞彬, 朱菁, 李爱文, 等. 基于自然语言处理的产业链知识图谱构建[J]. 情报学报, 2022, 41(3): 287-299.
MAO R B, ZHU J, LI A W, et al. Construction of knowledge graph of industry chain based on natural language processing[J]. Journal of the China society for scientific and technical information, 2022, 41(3): 287-299.
10 姚奕, 陈朝阳, 杜晓明, 等. 多模态知识图谱构建技术及其在军事领域的应用综述[J]. 计算机工程与应用, 2024, 60(22): 18-37.
YAO Y, CHEN Z Y, DU X M, et al. Survey of multimodal knowledge graph construction technology and its application in military field[J]. Computer engineering and applications, 2024, 60(22): 18-37.
11 陈怡然, 熊竹青, 周脚根, 等. 畜禽养殖业数据应用展望和问题分析[J]. 中国科学院院刊, 2024, 39(11): 1982-1993.
CHEN Y R, XIONG Z Q, ZHOU J G, et al. Prospect and problem analysis of industry data application in livestock and poultry breeding[J]. Bulletin of Chinese academy of sciences, 2024, 39(11): 1982-1993.
12 刘烨宸, 李华昱. 领域知识图谱研究综述[J]. 计算机系统应用, 2020, 29(6): 1-12.
LIU Y C, LI H Y. Survey on domain knowledge graph research[J]. Computer systems & applications, 2020, 29(6): 1-12.
13 NGUYEN H L, VU D T, JUNG J J. Knowledge graph fusion for smart systems: A Survey[J]. Information fusion, 2020, 61: 56-70.
14 张才科, 李小龙, 郑胜, 等. 基于大语言模型的知识图谱构建及应用研究[J]. 计算机科学与探索, 2024, 18(10): 2656-2667.
ZHANG C K, LI X L, ZHENG S, et al. Research on construction and application of knowledge graph based on large language model[J]. Journal of frontiers of computer science and technology, 2024, 18(10): 2656-2667.
15 ZHAO W X, ZHOU K, LI J, et al. A Survey of Large Language Models[J/OL]. arXiv, 2025. .
16 周正达, 王昊, 汪琳, 等. ChatKG: 一种基于大语言模型和提示工程的非遗知识图谱构建框架: 以中国非遗陶瓷制作工艺为例[J/OL]. 图书馆杂志, 2025: 1-30. .
ZHOU Z D, WANG H, WANG L, et al. ChatKG: A framework for constructing intangible cultural heritage knowledge graphs based on large language model and prompt engineering: A case study of Chinese intangible cultural heritage ceramics craft[J/OL]. Library journal, 2025: 1-30. .
17 陈宋生, 王明. 基于大语言模型的财会知识图谱构建及应用展望[J]. 会计之友, 2025(5): 152-161.
CHEN S S, WANG M. Construction and application prospect of accounting knowledge map based on large language model[J]. Friends of accounting, 2025(5): 152-161.
18 韦一金, 陈彦清, 王秀东, 等. 基于大语言模型的《中国小麦品种志》信息提取[J]. 数据与计算发展前沿(中英文), 2025, 7(1): 175-185.
WEI Y J, CHEN Y Q, WANG X D, et al. Information extraction from Chinese wheat varieties journal based on large language model[J]. Frontiers of data & computing, 2025, 7(1): 175-185.
19 皮乾坤, 卢记仓, 祝涛杰, 等. 一种基于大语言模型增强的零样本知识抽取方法[J/OL]. 计算机科学, 2025: 1-11. .
PI Q K, LU J C, ZHU T J, et al. A zero-shot knowledge extraction method based on large language model enhanced[J/OL]. Computer science, 2025: 1-11. .
20 WANG B, XU C, ZHAO X M, et al. MinerU: An open-source solution for precise document content extraction[J/OL]. arXiv, 2024. .
21 张文杰. 提示词治理: DeepSeek等国产大模型内容生成的人机协同模式[J/OL]. 苏州大学学报(哲学社会科学版), 2025: 1-12. .
ZHANG W J. Prompt governance: A study on human-machine collaboration models for content generation in the era of large language models baesd on DeepSeek[J/OL]. Journal of Soochow university (philosophy & social science edition), 2025: 1-12. .
22 SUN J, PAN Y T, YAN X H. Improving intermediate reasoning in zero-shot chain-of-thought for large language models with filter supervisor-self correction[J]. Neurocomputing, 2025, 620: 129219.
23 ADAMS G, FABBRI A R, LADHAK F, et al. From sparse to dense: GPT-4 summarization with chain of density prompting[J]. Proceedings of the conference on empirical methods in natural language processing conference on empirical methods in natural language processing, 2023, 2023(4th New Frontier Summarization Workshop): 68-74.
[1] ZHANG Xingwang, LI Jie, LI Sifan, WANG Xiaopei. Theoretical Model, Model Innovation, and Important Implications of DeepSeek Empowering Library Knowledge Services [J]. Journal of library and information science in agriculture, 2025, 37(1): 4-16.
[2] FAN Kexin, XIAN Guojian, ZHAO Ruixue, HUANG Yongwen, SUN Tan. Ontology Construction for Intelligent Control and Application of Crop Germplasm Resources [J]. Journal of library and information science in agriculture, 2024, 36(3): 92-107.
[3] CHEN Caiming, FENG Jianzhong, BAI Linyan, WANG Jian, XIE Nengfu, ZOU Jun. Representation Model of Agricultural Knowledge Graph Based on the HARP Framework [J]. Journal of library and information science in agriculture, 2023, 35(8): 66-77.
[4] FAN KeXin, SUN Tan, ZHAO RuiXue, KOU YuanTao, XIAN GuoJian. Comparison and Enlightenment of Crop Germplasm Resource Knowledge Service Platforms [J]. Journal of library and information science in agriculture, 2023, 35(5): 64-73.
[5] MA Weilu, XIAN Guojian, ZHAO Ruixue, LI Jiao, HUANG Yongwen, SUN Tan. Comparative Study and Optimization Strategies of Knowledge Graph Construction Management Systems [J]. Journal of library and information science in agriculture, 2023, 35(4): 19-31.
[6] CHANG ZhiJun, XU LiYuan, YU QianQian, ZHANG JianYong, WANG YongJi. Scientific and Technical Literature Data Management System Based on Life Cycle Model [J]. Journal of library and information science in agriculture, 2022, 34(6): 36-49.
[7] YANG Siluo, TIAN Peilin, ZHU Chuanyu, QIU Junping. Characteristics of UNESCO's Humanities and Social Sciences Research: Topic, Evolution and Cooperation [J]. Journal of library and information science in agriculture, 2021, 33(6): 6-17.
[8] XU Yongle, CHEN Yuanyuan, YANG Tingting, WAN Xiangli. Comparative Analysis of the Research on the Influence of Chinese and International Think Tanks [J]. Journal of library and information science in agriculture, 2021, 33(11): 50-62.
[9] LYU Lucheng, HAN Tao. Artificial Intelligence Empowers Library and Information Service ——Review of Forums about Information Technology for Library 2019 [J]. Journal of library and information science in agriculture, 2020, 32(5): 13-18.
[10] ZHANG Tao, SUN Ruiying, LI Zhongjun. Subject Clustering and Evolutionary Trend of Public Opinion Documents in China [J]. Journal of library and information science in agriculture, 2020, 32(2): 14-21.
[11] LI Zhongjun, SUN Ruiying, ZHANG Tao. Analysis of the Research Status of Public Opinion Ecology in China Based on Bibliometrics (2004-2019) [J]. Journal of library and information science in agriculture, 2020, 32(2): 5-13.
[12] CHEN Qingyun, CAO Jianfei, CHEN Rongzhen. Research and Practices From the Thesaurus to Knowledge Graph [J]. Journal of library and information science in agriculture, 2019, 31(1): 44-53.
[13] ZHI Yingying. Exploration on the Application of Machine Learning in Library Discover System —Taking the Discover Tool Yewno Based on Knowledge Graph as Example [J]. Journal of library and information science in agriculture, 2018, 30(7): 47-50.
[14] CHEN Fen, ZHU Tianxiu. Research on University Library's Subject Service Based on Bibliometrics and Knowledge Graph Analysis [J]. Journal of library and information science in agriculture, 2018, 30(1): 99-103.
[15] ZHAO Lai-juan. Visualized Analysis on the Research Hotspots of Mobile Library in China [J]. Journal of library and information science in agriculture, 2015, 27(8): 46-51.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!