中文    English

Journal of library and information science in agriculture ›› 2022, Vol. 34 ›› Issue (8): 4-18.doi: 10.13998/j.cnki.issn1002-1248.22-0101

    Next Articles

Applications and Prospect Analysis of Deep Learning in Plant Genomics and Crop Breeding

HOU Xiangying1, CUI Yunpeng2,*, LIU Juan2   

  1. 1. Zibo Academy of Agricultural Sciences, Zibo 255020;
    2. Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Institute of Agricultural Information, Chinese Academy of Agricultural Sciences, Beijing 100081
  • Received:2022-02-28 Online:2022-08-05 Published:2022-10-26

Abstract: [Purpose/Significance] Advances in single-cell sequencing and high-throughput technology have made it possible for plant genomics to accumulate large quantities of data describing multidimensional genomic-wide molecular phenotypes at low cost. As powerful data mining tools, deep learning techniques can be utilized to further predict and interpret the acquired molecular phenotypes. In recent studies, deep learning has been shown to yield significant results in plant genomics and crop breeding research. However, a complete review of deep learning applications in plant genomics is lacking. [Method/Process] The input to deep learning applied to genomics is usually biological sequences and molecular phenotypes as predictor and target variables, respectively. We introduced the workflow from four views: input data pre-processing includes retrieval, coding, and splitting; model construction and training includes the selection of model architecture and hyperparameters; model evaluation and interpretability. Specifically, this paper introduces the background of deep learning approaches, including the latest graph neural networks; then it discusses two prominent issues in the intersection of genomics and deep learning with respect to gene characterization and protein characterization: 1) how to model the flow of information from plant genomic DNA sequences to molecular phenotypes; and 2) how deep learning models can be utilized to identify functional variation in natural populations? Specifically, the paper summarizes the current status of deep learning applications in related fields, which include deep learning and DNA and gene characterization research, interpretability of deep learning in genomics applications, graph neural networks in genomics, deep learning and genomic variation research, deep learning in protein prediction, ALPHAFOLD in protein prediction, deep learning and crop breeding research, and unsupervised learning in genomics and protein characterization. [Results/Conclusions] This article summarizes how traditional deep-learning algorithms, graph deep-learning, generative adversarial networks and interpretable AI are applied in current research in order to address these two problems. Finally, the prospects for deep learning in future plant genomics research and crop improvement are discussed. Overall, deep learning has provided better results than conventional methods in many genomics research directions, and the application of deep learning in genomics has yielded early applications of scientific and economic significance. Deep learning offers two distinct advantages: 1) end-to-end learning, with the ability to integrate multiple pre-processing steps into a single model; and 2) multimodal data processing capabilities that can handle extremely heterogeneous data in genomics. The advancement of deep learning has the potential to expand new research perspectives in genomics and crop breeding, and to facilitate larger-scale association studies in both phenotypic and genotypic genomics as algorithms become more accurate.

Key words: plant genomics, crop breeding, deep learning, graph deep learning, review

CLC Number: 

  • S-1
[1] CRICK F.Central dogma of molecular biology[J]. Nature, 1970, 227(5258): 561-563.
[2] WAINBERG M, SINNOTT-ARMSTRONG N, MANCUSO N, et al.Opportunities and challenges for transcriptome-wide association studies[J]. Nature genetics, 2019, 51(4): 592-599.
[3] ERASLAN G, AVSEC Z, GAGNEUR J.Theis FJ deep learning: New computational modelling techniques for genomics[J]. Nature reviews genetics, 2019, 20(7): 389-403.
[4] XU C, JACKSON S A.Machine learning and complex biological data[J]. Genome biology, 2019, 20(1): 1-4.
[5] LAI X, STIGLIANI A, VACHON G, et al.Building transcription factor binding site models to understand gene regulation in plants[J]. Molecular plant, 2019, 12(6): 743-763.
[6] ZAMPIERI G, VIJAYAKUMAR S, YANESKE E, et al.Machine and deep learning meet genome-scale metabolic modeling[J]. PLoS computational biology, 2019, 15(7): E1007084.
[7] WANG H, CIMEN E, SINGH N, et al.Deep learning for plant genomics and crop improvement[J]. Current opinion in plant biology, 2020, 54: 34-41.
[8] DELONG A, WEIRAUCH M T, et al.Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning[J]. Nature biotechnology, 2015, 33(8): 831-838.
[9] ZHOU J, TROYANSKAYA O G.Predicting effects of noncoding variants with deep learning -Based sequence model[J]. Nature methods, 2015, 12(10): 931-934.
[10] CHING T, HIMMELSTEIN D S, BEAULIEU-JONES B K, et al. Opportunities and obstacles for deep learning in biology and medicine[J]. Journal of the royal society interface, 2018, 15(141): 20170387.
[11] WANG M, TAI C, E W, et al. DeFine: Deep convolutional neural networks accurately quantify intensities of transcription factor - DNA binding and facilitate evaluation of functional non-coding variants[J]. Nucleic acids research, 2018, 46(11): E69-E69.
[12] GREENSIDE P, SHIMKO T, FORDYCE P, et al.Discovering epistatic feature interactions from neural network models of regulatory DNA sequences[J]. Bioinformatics, 2018, 34(17): i629-i637.
[13] QIN Q, FENG J.Imputation for transcription factor binding predictions based on deep learning[J]. PLoS computational biology, 2017, 13(2): E1005403
[14] KELLEY D R, RESHEF Y A, BILESCHI M, et al.Sequential regulatory activity prediction across chromosomes with convolutional neural networks[J]. Genome research, 2018, 28(5): 739-750.
[15] SCHREIBER J, LIBBRECHT M, BILMES J, et al.Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture[J]. BioRxiv, 2017: 103614.
[16] ZENG H, GIFFORD D K.Predicting the impact of non-coding variants on DNA methylation[J]. Nucleic acids research, 2017, 45(11): E99.
[17] ANGERMUELLER C, LEE H J, REIK W, et al.DeepCpG: Accurate prediction of single-cell DNA methylation states using deep learning[J]. Genome biology, 2017, 18(1): 1-13.
[18] ZHOU J, THEESFELD C L, YAO K, et al.Deep learning sequence - Based AB initio prediction of variant effects on expression and disease risk[J]. Nature genetics, 2018, 50(8): 1171-1179.
[19] PAN X, SHEN H B.RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach[J]. BMC bioinformatics, 2017, 18(1): 1-14.
[20] KIM H K, MIN S, SONG M, et al.Deep learning improves predic-tion of CRISPR-Cpf1 guide RNA activity[J]. Nature biotechnology, 2018, 36(3): 239-241.
[21] ZHANG Y, AN L, XU J, et al.Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus[J]. Nature communica-tions, 2018, 9(1): 1-9.
[22] LUO R, SEDLAZECK F J, LAM T W, et al.Clairvoyante: A multi-task convolutional deep neural network for variant calling in single molecule sequencing[J]. BioRxiv, 2018: 310458.
[23] PAN X, RIJNBEEK P, YAN J, et al.Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks[J]. BMC genomics, 2018, 19(1): 1-11.
[24] QUANG D, XIE X.DanQ: A hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences[J]. Nucleic acids research, 2016, 44(11): E107.
[25] LEE B, BAEK J, PARK S, et al.DeepTarget: End-to-end learning framework for microRNA target prediction using deep recurrent neural networks[C]. Proceedings of the 7th ACM international conference on bioinformatics, computational biology, and health informatics, 2016: 434-442.
[26] PARK S, MIN S, CHOI H, et al. DeepMiRGene: Deep neural network based precursor microrna prediction[J]. ArXiv preprint arxiv:1605.00017, 2016.
[27] SHEN Z, BAO W, HUANG D S.Recurrent neural network for predicting transcription factor binding sites[J]. Scientific reports, 2018, 8(1): 1-10.
[28] KELLEY D R, SNOEK J, RINN J L.Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks[J]. Genome research, 2016, 26(7): 990-999.
[29] SIMONYAN K, VEDALDI A, ZISSERMAN A. Deep inside convolutional networks: Visualising image classification models and saliency maps[J]. ArXiv preprint arxiv:1312.6034, 2013.
[30] SHRIKUMAR A, GREENSIDE P, SHCHERBINA A, et al. Not just a black box: Learning important features through propagating activation differences[J]. ArXiv preprint arxiv:1605.01713, 2016.
[31] SUNDARARAJAN M, TALY A, YAN Q.Axiomatic attribution for deep networks[C]. International conference on machine learning, PMLR, 2017: 3319-3328.
[32] MA J, YU M K, FONG S, et al.Using deep learning to model the hierarchical structure and function of a cell[J]. Nature methods, 2018, 15(4): 290-298.
[33] ZITNIK M, LESKOVEC J.Predicting multicellular function through multi-layer tissue networks[J]. Bioinformatics, 2017, 33(14): i190-i198.
[34] ZITNIK M, AGRAWAL M, LESKOVEC J.Modeling polypharmacy side effects with graph convolutional networks[J]. Bioinformatics, 2018, 34(13): i457-i466.
[35] KEARNES S, MCCLOSKEY K, BERNDL M, et al.Molecular graph convolutions: Moving beyond fingerprints[J]. Journal of computer-aided molecular design, 2016, 30(8): 595-608.
[36] DUTIL F, COHEN J P, WEISS M, et al. Towards gene expression convolutions using gene interaction graphs[J]. ArXiv preprint arxiv:1806.06975, 2018.
[37] KELLEY D R.Cross-species regulatory sequence activity prediction[J]. PLoS computational biology, 2020, 16(7): E1008050.
[38] ZHANG Z, PARK C Y, THEESFELD C L, et al.An automated framework for efficiently designing deep convolutional neural networks in genomics[J]. Nature machine intelligence, 2021, 3(5): 392-400.
[39] TRAN N H, ZHANG X, XIN L, et al.De novo peptide sequencing by deep learning[J]. Proceedings of the national academy of sci-ences, 2017, 114(31): 8247-8252.
[40] YANG H, CHI H, ZENG W F, et al.PNovo 3: Precise de novo peptide sequencing using a learning-to-rank framework[J]. Bioinformatics, 2019, 35(14): i183-i190.
[41] GESSULAT S, SCHMIDT T, ZOLG D P, et al.Prosit: Proteome-wide prediction of peptide tandem mass spectra by deep learning[J]. Nature methods, 2019, 16(6): 509-518.
[42] MIRABELLO C, WALLNER B.RawMSA: Proper deep learning makes protein sequence profiles and feature extraction obsolete[J]. Biorxiv, 2018: 394437.
[43] HASHEMIFAR S, NEYSHABUR B, KHAN A A, et al.Predicting protein - Protein interactions through sequence-based deep learning[J]. Bioinformatics, 2018, 34(17): i802-i810.
[44] ZHANG D, KABUKA M.Multimodal deep representation learning for protein interaction identification and protein family classification[J]. BMC bioinformatics, 2019, 20(16): 1-14.
[45] LONGWELL S, SHIMKO T.Res2Vec: Amino acid vector embeddings from 3D-protein structure[J]. THRESHOLD, 30(22): 344.
[46] XU W, GAO Y, WANG Y, et al.Protein╞protein interaction prediction based on ordinal regression and recurrent convolutional neural networks[J]. BMC bioinformatics, 2021, 22(6): 1-21.
[47] SENIOR A W, EVANS R, JUMPER J, et al.Improved protein structure prediction using potentials from deep learning[J]. Nature, 2020, 577(7792): 706-710.
[48] JUMPER J, EVANS R, PRITZEL A, et al.Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589.
[49] BAEK M, DIMAIO F, ANISHCHENKO I, et al.Accurate prediction of protein structures and interactions using a three-track neural network[J]. Science, 2021, 373(6557): 871-876.
[50] TUNYASUVUNAKOOL K, ADLER J, WU Z, et al.Highly accurate protein structure prediction for the human proteome[J]. Nature, 2021, 596(7873): 590-596.
[51] CHOWDHURY R, BOUATTA N, BISWAS S, et al.Single-sequence protein structure prediction using language models from deep learning[J]. BioRxiv, 2021.
[52] RODRíGUEZ-LEAL D, LEMMON Z H, MAN J, et al. Engineering quantitative trait variation for crop improvement by genome editing[J]. Cell, 2017, 171(2): 470-480, e8.
[53] GUPTA A, ZOU J. Feedback GAN (FBGAN) for DNA: A novel feedback-loop architecture for optimizing protein functions[J]. ArXiv preprint arxiv:1804.01694, 2018.
[54] LOPEZ R, REGIER J, COLE M B, et al.Deep generative modeling for single-cell transcriptomics[J]. Nature methods, 2018, 15(12): 1053-1058.
[55] M R, BEYAN O, ZAPPA A, et al. Deep learning-based clustering approaches for bioinformatics[J]. Briefings in bioinformatics, 2021, 22(1): 393-415.
[56] XIE R, WEN J, QUITADAMO A, et al.A deep auto-encoder model for gene expression prediction[J]. BMC genomics, 2017, 18(9): 39-49.
[57] DINCER A B, JANIZEK J D, LEE S I.Adversarial deconfounding autoencoder for learning robust gene expression embeddings[J]. Bioinformatics, 2020, 36(2): i573.
[58] KILLORAN N, LEE L J, DELONG A, et al. Generating and designing DNA with deep generative models[J]. ArXiv preprint arxiv:1712.06148, 2017.
[59] GHAHRAMANI A, WATT F M, LUSCOMBE N M.Generative adversarial networks simulate gene expression and predict perturbations in single cells[J]. BioRxiv, 2018: 262501.
[1] WU Dan, XU Huaqing. Affective Computing for Social Robots from the Perspective of Human-AI Interaction: A Literature Review and Theoretical Model Construction [J]. Journal of library and information science in agriculture, 2026, 38(1): 4-17.
[2] CHANG Hao, XU Taotao, LI Feng. A Multi-dimensional Feature Text Complexity Framework and Knowledge Base Augmentation Model [J]. Journal of library and information science in agriculture, 2025, 37(8): 61-77.
[3] LI Xinxin, MA Yumeng, JU Zihan, WANG Jing. Aspect-Level Sentiment Analysis of Science and Technology Policy Reviews Based on Large Language Models: A Case Study of the New Energy Vehicle Industry [J]. Journal of library and information science in agriculture, 2025, 37(10): 53-66.
[4] SHI Qin, XIE Jing, WU Shang. Influencing Factors and Correlations of User Satisfaction with Mobile Health Applications [J]. Journal of library and information science in agriculture, 2025, 37(1): 33-46.
[5] Fan YUAN, Jia LI. Opportunities, Challenges, and Future Directions for Generative Artificial Intelligence in Library Information Literacy Education: A Scoping Review [J]. Journal of library and information science in agriculture, 2024, 36(9): 44-57.
[6] Guowei GAO, Shanshan ZHANG, Jialan YU. A Review of Health Information Behaviors of Older People from the Perspective of Topic Differentiation [J]. Journal of library and information science in agriculture, 2024, 36(7): 34-49.
[7] ZHANG Zhixiong, WANG Yuju, ZHAO Yang. Development Trends of International Open Peer Review Platforms and Recommendations for China [J]. Journal of library and information science in agriculture, 2024, 36(5): 14-22.
[8] LIU Yang, LYU Shuyue, LI Ruojun. Concept, Task, and Application of Social Robots in Information Behavior Research [J]. Journal of library and information science in agriculture, 2024, 36(3): 4-20.
[9] HAN Xi, LIAO Ke. Factors Influencing Misinformation Propagation: A Systemic Review [J]. Journal of library and information science in agriculture, 2024, 36(12): 45-63.
[10] Yifan ZHANG, Zuqin CHEN, Jike GE, Mingkun HE, Jie TAN. Construction of a Multimodal Dataset for Emergency Event Identification and Classification [J]. Journal of library and information science in agriculture, 2024, 36(10): 76-85.
[11] WANG Sili, ZHANG Ling, YANG Heng, LIU Wei. Review of Deep Learning for Language Modeling [J]. Journal of library and information science in agriculture, 2023, 35(8): 4-18.
[12] LIU Ting, ZHAO Yajuan. Review and Prospect of Research on Technology Opportunity Identification [J]. Journal of library and information science in agriculture, 2023, 35(7): 4-17.
[13] WNAG Lingfeng, WANG Shenpeng. Key Parameter Optimization Design of Self-organizing Peer Review in National Preprint Publishing Platform Based on Response Surface Analysis [J]. Journal of library and information science in agriculture, 2023, 35(7): 75-84.
[14] LIU Nanzhu, CUI Yunpeng, WANG Mo. Construction and Application of Semantic Retrieval Model for Ancient Agricultural Literature [J]. Journal of library and information science in agriculture, 2023, 35(7): 52-62.
[15] LU Lina, YU Xiao. Recognition and Classification of Deep Learning in Soybean Leaf Image Data Management [J]. Journal of library and information science in agriculture, 2023, 35(2): 87-94.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!