中文    English

Journal of Library and Information Science in Agriculture

   

Knowledge Model and Construction of Soybean Breeding

Zhihao GUAN1, Zhiyi SHAN2,3, Tian LI1, Ruixue ZHAO1,4()   

  1. 1. Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing 100081
    2. National Science Library, Chinese Academy of Sciences, Beijing 100190
    3. Department of Information Resources Management, School of Economics and Management, University of ChineseAcademy ofSclences, Beijing 100190
    4. Key Laboratory of Knowledge Mining and Knowledge Services in Agricultural Converging Publishing, National Press and Publication Administration, Beijing 100081
  • Received:2024-09-27 Online:2025-03-18
  • Contact: Ruixue ZHAO

Abstract:

[Purpose/Significance] To address the problem of semantic ambiguity and soybean breeding knowledge that needs to be revealed in depth, a structured knowledge model was established to thoroughly discuss the definition of key concepts and their interactions involved in the breeding process, standardize the definition and organization of soybean breeding knowledge, and promote the unified expression of knowledge. [Method/Process] By analyzing the characteristics of knowledge structure in the field of soybean molecular breeding, according to the seven-step method of Stanford ontology construction, the semantic model of soybean molecular breeding was established by using the ontology construction tool protege 5.6.3. A total of 48 classes were constructed in the soybean breeding concept ontology, which clarified the concepts and hierarchical associations among concepts under traits, compounds, enrichment pathways and growth classification. Seven types of causal relationships and three types of static relationships were defined. Finally, the ontology-based knowledge graph was presented based on a PubMed literature, and the knowledge unit with Dt1 gene as the central node was queried. [Results/Conclusions] This study integrated the existing knowledge base and ontology related to soybean breeding, established a knowledge model at the biomolecular level in the field of soybean breeding, and provided a certain reference for knowledge sharing and semantic integration in this field. Compared with the existing knowledge models, this study analyzed the characteristics of knowledge structure in soybean breeding, extracted the key entity types and relationship types in the process of hypothesis generation, and constructed an ontology model based on this, which could describe gene expression patterns in soybean growth and development more comprehensively. This is of great significance for discovering the key genes associated with specific traits and analyzing the molecular regulatory networks formed by traits, which will help to accurately design and optimize breeding strategies. The knowledge model constructed in this study could be applied to knowledge discovery, causal reasoning and other scenarios in soybean breeding, supporting experimental design and promoting interdisciplinary communication. The limitation of this study is that the ontology was constructed manually and no automated natural language processing method was used. In addition, in the subsequent use of soybean breeding knowledge model, it is necessary to keep up with the frontier of development in soybean breeding, expand new concept types, add new concept names and relationship names in time according to the knowledge description needs of field scientists, and regularly maintain and expand soybean breeding knowledge model.

Key words: semantic model, ontology construction, soybean breeding, knowledge organization

CLC Number: 

  • G250

Fig.1

Multiple expressions of covariant relationships for the same trait in the literature titled “POWR1 is a domestication gene pleiotropically regulating seed quality and yield in soybean”"

Table1

Sources of thesaurus"

来源知识库 领域 本体名称 本体描述 术语大类
Crop ontology 植物科学 Soybean Ontology Soybean traits Abiotic stress、Agronomic、Biochemical、Morphological、Phenological
Plant ontology 植物科学 Plant Ontology、Plant Trait Ontology、Plant Experimental Conditions Ontology Plant Ontology includes plant anatomical entity and plant structure development stage plant anatomical entity、plant structure development stage、Plant Trait Ontology、Plant Experimental Conditions Ontology
SoyBase 植物科学 Soybean Ontologies growth stages、plant structure names、development、 plant traits Soybean Whole Plant Growth Ontology、Soybean Structure Ontology、Soybean Developmental Ontology、Soybean Trait Ontology
CAB thesaurus 生命科学 CAB thesaurus 涵盖生命科学的专用术语,包括植物、动物和微生物名称 主题类别:AM-Anatomical and Morphological Structures、CH-Chemicals and Chemical Groups、CL-Climate Related、DT-Diseases, Disorders, and Symptoms、OG-Organism Groups、PR-Properties、PZ-Natural Processes、SO-Soil Types
AGROVOC 农学 AGROVOC Multilingual Thesaurus 农业领域通用术语 生物体、方法、群体、现象、资源......
Gene Ontology 生命科学 Gene Ontology Molecular Function、Cellular Component、Biological Process Gene/product name、Direct annotation、Involved in
UniProt 生命科学 controlled vocabulary for “Comments section” functional information of protein function、subcellular location、ptm/processing、GO、expressing、interaction
Gene Regulation Ontology 生命科学 Gene Regulation Ontology Genomic and Proteomic continuant、gene product、molecular binding、occurrent

Fig. 2

Conceptual ontology of soybean breeding"

Table 2

Sources of terms in the ontology of soybean breeding concepts"

实体类别 实体描述 数据来源
No_Genomic_Products “small” chemical compounds, except for molecules directly encoded by the genome (e.g. nucleic acids, proteins and peptides derived from proteins by cleavage) ChEBI、Plant Metabolic Network
DNA gene name Gene Ontology
miRNA microRNA name miRbase
Protein protein name Gene Ontology
Biological_Process biological programs' accomplished by multiple molecular activities Gene Ontology
Cellular_Component a location, relative to cellular compartments and structures, occupied by a macromolecular machine Gene Ontology
Metabolic_Pathway a name of a pathway, including energy metabolism, lipid metabolism, nucleotide metabolism, amino acid metabolism, etc. KEGG
Molecular_Function molecular-level activities performed by gene products Gene Ontology
Anatomical_Space both material entities such as plant structures and immaterial entities such as plant anatomical spaces Plant ontology
Development_Satges describe the growth and maturation of individual soybean tissues or organ systems SoyBase
Structure enumerate the parts of a soybean plant in order to follow the temporal development of individual plants SoyBase
Trait include yeild, biotic and abiotic stress resistance, and growth and developmental traits SoyBase

Table3

Ontology of soybean breeding relationships"

大类 关系名称 定义 定义域 值域
因果关系Causal Relation

表达

expressed in

a DNA or gene product expressed in a anatomical space, growth stage or structure. genomic development

相互作用

interacts with

a compound interacts with another protein compound compound

正向调节

positively regulates

a compound regulates the accumulation of another compound or the expression of a trait compound compound/trait

负向调节

negatively regulates

a compound negatively regulates the accumulation of another compound or the expression of a trait compound compound/trait

相关

associates with

a compound associates with a trait, a correlation between two traits, a correlation between two compounds compound/trait/ compound/trait
编码encodes a gene encodes a gene product DNA protein
参与involved in a gene involved in a process DNA molecular function/biological process/metabolic pathway
静态关系Static Relation 结合 binds to a compound physically binds to another compound compound compound
位于localized in a compound is present during a development phase compound cellular component
存在于 exists at a compound is found in a tissue compound development

Fig.3

Semantic ontology model"

Fig.4

Conceptual class hierarchy diagram of "soybean breeding knowledge""

Fig.5

Hierarchical diagram of "soybean breeding knowledge" relationship classes"

Fig.6

An example of knowledge graph based on a PubMed literature"

Fig.7

The knowledge unit with "Dt1" as the central node"

1
中共中央, 国务院. 中共中央 国务院关于进一步深化农村改革 扎实推进乡村全面振兴的意见[EB/OL]. (2025-01-01)[2025-03-05].
2
PETEREIT J, MARSH J I, BAYER P E, et al. Genetic and genomic resources for soybean breeding research[J]. Plants, 2022, 11(9): 1181.
3
范可昕, 鲜国建, 赵瑞雪, 等. 面向农作物种质资源智能化管控与应用的本体构建[J]. 农业图书情报学报, 2024, 36(3): 92-107.
FAN K X, XIAN G J, ZHAO R X, et al. Ontology construction for intelligent control and application of crop germplasm resources[J]. Journal of library and information science in agriculture, 2024, 36(3): 92-107.
4
管博伦, 张立平, 朱静波, 等. 农业病虫害图像数据集构建关键问题及评价方法综述[J]. 智慧农业(中英文), 2023, 5(3): 17-34.
GUAN B L, ZHANG L P, ZHU J B, et al. The key issues and evaluation methods for constructing agricultural pest and disease image datasets: A review[J]. Smart agriculture, 2023, 5(3): 17-34.
5
BROWN A V, CONNERS S I, HUANG W, et al. A new decade and new data at SoyBase, the USDA-ARS soybean genetics and genomics database[J]. Nucleic acids research, 2021, 49(D1): D1496-D1501.
6
CGIAR. Crop ontology[EB/OL]. [2024-04-09].
7
COOPER L. Plant ontology[EB/OL]. [2024-04-09].
8
COOPER L. Planteome[EB/OL]. [2024-04-09].
9
WANG P X, ZHANG C, WANG D Q, et al. Relation extraction for knowledge graph generation in the agriculture domain: A case study on soybean pests and disease[J]. Applied engineering in agriculture, 2023, 39(2): 215-224.
10
JOSHI T, WANG J J, ZHANG H X, et al. The evolution of soybean knowledge base (SoyKB)[J]. Methods in molecular biology, 2017, 1533: 149-159.
11
WANG Z, LIBAULT M, JOSHI T, et al. SoyDB: A knowledge database of soybean transcription factors[J]. BMC plant biology, 2010, 10: 14.
12
XU Y G, GUO M Z, LIU X Y, et al. SoyFN: A knowledge database of soybean functional networks[J]. Database, 2014: bau019.
13
KIM E, HWANG S, LEE I. SoyNet: A database of co-functional networks for soybean Glycine max[J]. Nucleic acids research, 2017, 45(D1): D1082-D1089.
14
PANZADE G, GANGWAR I, AWASTHI S, et al. Plant regulomics portal (PRP): A comprehensive integrated regulatory information and analysis portal for plant genomes[J]. Database, 2019, 2019: baz130.
15
LARMANDE P, NGOMPE G T, VENKATESAN A, et al. AgroLD: A knowledge graph database for plant functional genomics[J]. Methods in molecular biology, 2022, 2443: 527-540.
16
SINGH A, SHARMA A K, SINGH N K, et al. PpTFDB: A pigeonpea transcription factor database for exploring functional genomics in legumes[J]. PLoS One, 2017, 12(6): e0179736.
17
CHO H, KIM B, CHOI W, et al. Plant phenotype relationship corpus for biomedical relationships between plants and phenotypes[J]. Scientific data, 2022, 9(1): 235.
18
LOTRECK S, SEGURA ABÁ K, LEHTI-SHIU M D, et al. Plant science knowledge graph corpus: A gold standard entity and relation corpus for the molecular plant sciences[J]. In silico plants, 2024, 6(1): diad021.
19
TAYADE R, IMRAN M, GHIMIRE A, et al. Molecular, genetic, and genomic basis of seed size and yield characteristics in soybean[J]. Frontiers in plant science, 2023, 14: 1195210.
20
HAN X, ZHANG Y W, LIU J Y, et al. 4D genetic networks reveal the genetic basis of metabolites and seed oil-related traits in 398 soybean RILs[J]. Biotechnology for biofuels and bioproducts, 2022, 15(1): 92.
21
BELLO S K. An overview of the morphological, genetic and metabolic mechanisms regulating phosphorus efficiency via root traits in soybean[J]. Journal of soil science and plant nutrition, 2021, 21(2): 1013-1029.
22
PAN Q C, WEI J F, GUO F, et al. Trait ontology analysis based on association mapping studies bridges the gap between crop genomics and Phenomics[J]. BMC genomics, 2019, 20(1): 443.
23
COOPER L, ELSER J, LAPORTE M A, et al. Planteome 2024 update: Reference ontologies and knowledgebase for plant biology[J]. Nucleic acids research, 2024, 52(D1): D1548-D1555.
24
DAI X B, ZHUANG Z H, BOSCHIERO C, et al. LegumeIP V3: From models to crops-an integrative gene discovery platform for translational genomics in legumes[J]. Nucleic acids research, 2021, 49(D1): D1472-D1479.
25
WANG J X, LYU Z, HOSSAIN S, et al. SoyTSN: A web-based prediction tool for soybean tissue specific network within SoyKB[C]//2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). November 13-16, 2017, Kansas City, MO, USA. IEEE, 2017: 2326.
26
JI L X, MATHIONI S M, JOHNSON S, et al. Genome-wide reinforcement of DNA methylation occurs during somatic embryogenesis in soybean[J]. The plant cell, 2019, 31(10): 2315-2331.
27
TAHERIYAN M, KNOBLOCK C A, SZEKELY P, et al. A scalable approach to learn semantic models of structured sources[C]//2014 IEEE International Conference on Semantic Computing. June 16-18, 2014, Newport Beach, CA, USA. IEEE, 2014: 183-190.
28
刘静羽, 顾立平, 王昉, 等. 国际大型词表开放共享协议分析与启示: 以STKOS词表开放共享协议设计为例[J]. 图书馆杂志, 2020, 39(2): 41-50.
LIU J Y, GU L P, WANG F, et al. Analysis and enlightenments of open licenses of international large scale vocabularies: A case of STKOS metathesaurus open licenses design[J]. Library journal, 2020, 39(2): 41-50.
29
Consultative Group on International Agricultural Research. Crop ontology[EB/OL]. [2024-03-04].
30
National Agricultural Library, United States Department of Agriculture. NAL agricultural thesaurus[EB/OL]. [2024-03-18].
31
BAKER T, WHITEHEAD B, MUSKER R, et al. Global agricultural concept space: Lightweight semantics for pragmatic interoperability[J]. NPJ science of food, 2019, 3: 16.
[1] AN Bo. Literature Classification Methods based on Structural Information Enhancement [J]. Journal of Library and Information Science in Agriculture, 2023, 35(3): 15-24.
[2] ZHANG Zhixiong, ZENG Jianxun, XIA Cuijuan, WANG Dongbo, LI Baiyang, CAI Yingchun. Information Resource Management Researchers' Thinking about the Opportunities and Challenges of AIGC [J]. Journal of Library and Information Science in Agriculture, 2023, 35(1): 4-25.
[3] SUN Shaodan, DENG Jun, ZHANG Zishu, ZHONG Chuyi, SHENG Panpan. Topic Knowledge Organization of Modern Newspaper Resources by Incorporating the Knowledge Element Concept: Taking the "Shengjing Times" as an Example [J]. Journal of Library and Information Science in Agriculture, 2022, 34(4): 50-62.
[4] WANG Xin, LU Yao, YUAN Xue, ZHAO Wanjing, CHEN Li, LIU Minjuan. A Survey of Author Name Disambiguation Techniques of Academic Papers [J]. Journal of Library and Information Science in Agriculture, 2022, 34(10): 82-90.
[5] SUN Tan, DING Pei, HUANG Yongwen, XIAN Guojian. Review on the Application and Development Strategies of Text Mining in Agriculture Knowledge Services [J]. Journal of Library and Information Science in Agriculture, 2021, 33(1): 4-16.
[6] WANG Ying. Semantic Models for the Content of Scientific Literature [J]. Journal of Library and Information Science in Agriculture, 2020, 32(8): 12-24.
[7] CHAI Miaoling, HUANG Lin, REN Yunyue. A Review of Construction of Major Agricultural Open Scientific Data Resources [J]. Journal of Library and Information Science in Agriculture, 2020, 32(10): 25-34.
[8] CHEN Qingyun, CAO Jianfei, CHEN Rongzhen. Research and Practices From the Thesaurus to Knowledge Graph [J]. Agricultural Library and Information, 2019, 31(1): 44-53.
[9] SUN Haixia, LI Junlian, HUA Weina, QIAN Qing. Design and Implementation of Network Collaborative Work Platform for Semantic Interoperability of Science and Technology Knowledge Organization Systems [J]. Agricultural Library and Information, 2019, 31(1): 23-34.
[10] WEI Haiyan. A Comparative Study on the Patterns of Bibliographic Data Organization in Domestic Libraries [J]. , 2018, 30(7): 43-46.
[11] CHEN Demin. Research on Information Discovery Service Model of Digital Library Based on Knowledge Organization [J]. , 2018, 30(4): 185-188.
[12] QIN Feifei, CAO Tao, QIAN Zhiyong. Academic Integrity Knowledge Organization Integrated with Online Teaching of Information Literacy [J]. , 2017, 29(12): 120-126.
[13] LIU Zhao-wei. Research of Personalized Literature Retrieval mode Based on User’s Demand [J]. , 2016, 28(6): 158-161.
[14] REN Wei. The Research of Knowledge Management Strategy Based on Crisis of Marginalization [J]. , 2015, 27(6): 5-9.
[15] Library, Xi’an University of Technology, Xi’an 710048, China. Practice and Exiting Problems of Chinese Book Purchase for University Libraries [J]. , 2014, 26(3): 92-95.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!