农业图书情报学报 ›› 2022, Vol. 34 ›› Issue (8): 4-18.doi: 10.13998/j.cnki.issn1002-1248.22-0101

• 农业深度学习专题 •    下一篇

深度学习在植物基因组学与作物育种中的应用现状与展望

侯祥英1, 崔运鹏2,*, 刘娟2   

  1. 1.淄博市农业科学研究院,淄博 255020;
    2.农业农村部农业大数据重点实验室,中国农业科学院农业信息研究所,北京 100081
  • 收稿日期:2022-02-28 出版日期:2022-08-05 发布日期:2022-10-26
  • 通讯作者: *崔运鹏,研究员,中国农业科学院农业信息研究所农业大数据挖掘研究室,主任。Email:cuiyunpeng@caas.cn
  • 作者简介:侯祥英(1971- ),女,农艺师,淄博市农业科学研究院,研究方向为经济作物栽培与育种。刘娟(1978- ),女,副研究员,研究方向为农业大数据应用研究,数据治理
  • 基金资助:
    中国农业科学院院增项目“作物育种深度分析技术”(2020ZLK005); 国家社科基金重大项目“中国古农书的搜集、整理与研究”(21&ZD332)

Applications and Prospect Analysis of Deep Learning in Plant Genomics and Crop Breeding

HOU Xiangying1, CUI Yunpeng2,*, LIU Juan2   

  1. 1. Zibo Academy of Agricultural Sciences, Zibo 255020;
    2. Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Institute of Agricultural Information, Chinese Academy of Agricultural Sciences, Beijing 100081
  • Received:2022-02-28 Online:2022-08-05 Published:2022-10-26

摘要: [目的/意义]随着单细胞测序、高通量技术的突破,植物基因组学也取得了巨大进步,可以低成本获取多维全基因组分子表型的海量数据。深度学习技术可以作为强大的数据挖掘工具对获取的分子表型进行进一步预测和解释。当前研究表明,深度学习在植物基因组学与作物育种研究任务中取得显著效果。但目前尚缺乏对于深度学习在植物基因组学中应用的完整综述。[方法/过程]本文首先概述了深度学习方法背景,包括最新的图神经网络;随后着重从基因特性、蛋白质特性方面综述了基因组学和深度学习交叉领域的两个突出问题:1)如何对从植物基因组DNA序列到分子表型的信息流进行建模?2)如何使用深度学习模型识别自然种群中的功能变异?[结果/结论]本文总结了当前研究中如何应用传统深度学习算法、图深度学习、生成对抗网络以及可解释性AI等方法解决上述两个问题。最后分析了深度学习在未来植物基因组学研究和作物遗传改良中的发展前景。

关键词: 植物基因组学, 作物育种, 深度学习, 图深度学习, 综述

Abstract: [Purpose/Significance] Advances in single-cell sequencing and high-throughput technology have made it possible for plant genomics to accumulate large quantities of data describing multidimensional genomic-wide molecular phenotypes at low cost. As powerful data mining tools, deep learning techniques can be utilized to further predict and interpret the acquired molecular phenotypes. In recent studies, deep learning has been shown to yield significant results in plant genomics and crop breeding research. However, a complete review of deep learning applications in plant genomics is lacking. [Method/Process] The input to deep learning applied to genomics is usually biological sequences and molecular phenotypes as predictor and target variables, respectively. We introduced the workflow from four views: input data pre-processing includes retrieval, coding, and splitting; model construction and training includes the selection of model architecture and hyperparameters; model evaluation and interpretability. Specifically, this paper introduces the background of deep learning approaches, including the latest graph neural networks; then it discusses two prominent issues in the intersection of genomics and deep learning with respect to gene characterization and protein characterization: 1) how to model the flow of information from plant genomic DNA sequences to molecular phenotypes; and 2) how deep learning models can be utilized to identify functional variation in natural populations? Specifically, the paper summarizes the current status of deep learning applications in related fields, which include deep learning and DNA and gene characterization research, interpretability of deep learning in genomics applications, graph neural networks in genomics, deep learning and genomic variation research, deep learning in protein prediction, ALPHAFOLD in protein prediction, deep learning and crop breeding research, and unsupervised learning in genomics and protein characterization. [Results/Conclusions] This article summarizes how traditional deep-learning algorithms, graph deep-learning, generative adversarial networks and interpretable AI are applied in current research in order to address these two problems. Finally, the prospects for deep learning in future plant genomics research and crop improvement are discussed. Overall, deep learning has provided better results than conventional methods in many genomics research directions, and the application of deep learning in genomics has yielded early applications of scientific and economic significance. Deep learning offers two distinct advantages: 1) end-to-end learning, with the ability to integrate multiple pre-processing steps into a single model; and 2) multimodal data processing capabilities that can handle extremely heterogeneous data in genomics. The advancement of deep learning has the potential to expand new research perspectives in genomics and crop breeding, and to facilitate larger-scale association studies in both phenotypic and genotypic genomics as algorithms become more accurate.

Key words: plant genomics, crop breeding, deep learning, graph deep learning, review

中图分类号: 

  • S-1

引用本文

侯祥英, 崔运鹏, 刘娟. 深度学习在植物基因组学与作物育种中的应用现状与展望[J]. 农业图书情报学报, 2022, 34(8): 4-18.

HOU Xiangying, CUI Yunpeng, LIU Juan. Applications and Prospect Analysis of Deep Learning in Plant Genomics and Crop Breeding[J]. Journal of Library and Information Science in Agriculture, 2022, 34(8): 4-18.