Knowledge Model and Construction of Soybean Breeding

Zhihao GUAN1, Zhiyi SHAN2,3, Tian LI1, Ruixue ZHAO1,4()   

  1. 1. Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing 100081
    2. National Science Library, Chinese Academy of Sciences, Beijing 100190
    3. Department of Information Resources Management, School of Economics and Management, University of ChineseAcademy ofSclences, Beijing 100190
    4. Key Laboratory of Knowledge Mining and Knowledge Services in Agricultural Converging Publishing, National Press and Publication Administration, Beijing 100081
  • Received:2024-09-27 Online:2025-03-18
  • Contact: Ruixue ZHAO


[Purpose/Significance] To address the problem of semantic ambiguity and soybean breeding knowledge that needs to be revealed in depth, a structured knowledge model was established to thoroughly discuss the definition of key concepts and their interactions involved in the breeding process, standardize the definition and organization of soybean breeding knowledge, and promote the unified expression of knowledge. [Method/Process] By analyzing the characteristics of knowledge structure in the field of soybean molecular breeding, according to the seven-step method of Stanford ontology construction, the semantic model of soybean molecular breeding was established by using the ontology construction tool protege 5.6.3. A total of 48 classes were constructed in the soybean breeding concept ontology, which clarified the concepts and hierarchical associations among concepts under traits, compounds, enrichment pathways and growth classification. Seven types of causal relationships and three types of static relationships were defined. Finally, the ontology-based knowledge graph was presented based on a PubMed literature, and the knowledge unit with Dt1 gene as the central node was queried. [Results/Conclusions] This study integrated the existing knowledge base and ontology related to soybean breeding, established a knowledge model at the biomolecular level in the field of soybean breeding, and provided a certain reference for knowledge sharing and semantic integration in this field. Compared with the existing knowledge models, this study analyzed the characteristics of knowledge structure in soybean breeding, extracted the key entity types and relationship types in the process of hypothesis generation, and constructed an ontology model based on this, which could describe gene expression patterns in soybean growth and development more comprehensively. This is of great significance for discovering the key genes associated with specific traits and analyzing the molecular regulatory networks formed by traits, which will help to accurately design and optimize breeding strategies. The knowledge model constructed in this study could be applied to knowledge discovery, causal reasoning and other scenarios in soybean breeding, supporting experimental design and promoting interdisciplinary communication. The limitation of this study is that the ontology was constructed manually and no automated natural language processing method was used. In addition, in the subsequent use of soybean breeding knowledge model, it is necessary to keep up with the frontier of development in soybean breeding, expand new concept types, add new concept names and relationship names in time according to the knowledge description needs of field scientists, and regularly maintain and expand soybean breeding knowledge model.

Key words: semantic model, ontology construction, soybean breeding, knowledge organization

Multiple expressions of covariant relationships for the same trait in the literature titled “POWR1 is a domestication gene pleiotropically regulating seed quality and yield in soybean”"


Sources of thesaurus"

来源知识库 领域 本体名称 本体描述 术语大类
Crop ontology 植物科学 Soybean Ontology Soybean traits Abiotic stress、Agronomic、Biochemical、Morphological、Phenological
Plant ontology 植物科学 Plant Ontology、Plant Trait Ontology、Plant Experimental Conditions Ontology Plant Ontology includes plant anatomical entity and plant structure development stage plant anatomical entity、plant structure development stage、Plant Trait Ontology、Plant Experimental Conditions Ontology
SoyBase 植物科学 Soybean Ontologies growth stages、plant structure names、development、 plant traits Soybean Whole Plant Growth Ontology、Soybean Structure Ontology、Soybean Developmental Ontology、Soybean Trait Ontology
CAB thesaurus 生命科学 CAB thesaurus 涵盖生命科学的专用术语,包括植物、动物和微生物名称 主题类别:AM-Anatomical and Morphological Structures、CH-Chemicals and Chemical Groups、CL-Climate Related、DT-Diseases, Disorders, and Symptoms、OG-Organism Groups、PR-Properties、PZ-Natural Processes、SO-Soil Types
AGROVOC 农学 AGROVOC Multilingual Thesaurus 农业领域通用术语 生物体、方法、群体、现象、资源......
Gene Ontology 生命科学 Gene Ontology Molecular Function、Cellular Component、Biological Process Gene/product name、Direct annotation、Involved in
UniProt 生命科学 controlled vocabulary for “Comments section” functional information of protein function、subcellular location、ptm/processing、GO、expressing、interaction
Gene Regulation Ontology 生命科学 Gene Regulation Ontology Genomic and Proteomic continuant、gene product、molecular binding、occurrent

Fig. 2

Conceptual ontology of soybean breeding"

Table 2

Sources of terms in the ontology of soybean breeding concepts"

实体类别 实体描述 数据来源
No_Genomic_Products “small” chemical compounds, except for molecules directly encoded by the genome (e.g. nucleic acids, proteins and peptides derived from proteins by cleavage) ChEBI、Plant Metabolic Network
DNA gene name Gene Ontology
miRNA microRNA name miRbase
Protein protein name Gene Ontology
Biological_Process biological programs' accomplished by multiple molecular activities Gene Ontology
Cellular_Component a location, relative to cellular compartments and structures, occupied by a macromolecular machine Gene Ontology
Metabolic_Pathway a name of a pathway, including energy metabolism, lipid metabolism, nucleotide metabolism, amino acid metabolism, etc. KEGG
Molecular_Function molecular-level activities performed by gene products Gene Ontology
Anatomical_Space both material entities such as plant structures and immaterial entities such as plant anatomical spaces Plant ontology
Development_Satges describe the growth and maturation of individual soybean tissues or organ systems SoyBase
Structure enumerate the parts of a soybean plant in order to follow the temporal development of individual plants SoyBase
Trait include yeild, biotic and abiotic stress resistance, and growth and developmental traits SoyBase


Ontology of soybean breeding relationships"

大类 关系名称 定义 定义域 值域
因果关系Causal Relation


expressed in

a DNA or gene product expressed in a anatomical space, growth stage or structure. genomic development


interacts with

a compound interacts with another protein compound compound


positively regulates

a compound regulates the accumulation of another compound or the expression of a trait compound compound/trait


negatively regulates

a compound negatively regulates the accumulation of another compound or the expression of a trait compound compound/trait


associates with

a compound associates with a trait, a correlation between two traits, a correlation between two compounds compound/trait/ compound/trait
编码encodes a gene encodes a gene product DNA protein
参与involved in a gene involved in a process DNA molecular function/biological process/metabolic pathway
静态关系Static Relation 结合 binds to a compound physically binds to another compound compound compound
位于localized in a compound is present during a development phase compound cellular component
存在于 exists at a compound is found in a tissue compound development


Semantic ontology model"


Conceptual class hierarchy diagram of "soybean breeding knowledge""


Hierarchical diagram of "soybean breeding knowledge" relationship classes"


An example of knowledge graph based on a PubMed literature"


The knowledge unit with "Dt1" as the central node"

