中文    English

Journal of Library and Information Science in Agriculture ›› 2022, Vol. 34 ›› Issue (8): 4-18.doi: 10.13998/j.cnki.issn1002-1248.22-0101

    Next Articles

Applications and Prospect Analysis of Deep Learning in Plant Genomics and Crop Breeding

HOU Xiangying1, CUI Yunpeng2,*, LIU Juan2   

  1. 1. Zibo Academy of Agricultural Sciences, Zibo 255020;
    2. Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Institute of Agricultural Information, Chinese Academy of Agricultural Sciences, Beijing 100081
  • Received:2022-02-28 Online:2022-08-05 Published:2022-10-26

Abstract: [Purpose/Significance] Advances in single-cell sequencing and high-throughput technology have made it possible for plant genomics to accumulate large quantities of data describing multidimensional genomic-wide molecular phenotypes at low cost. As powerful data mining tools, deep learning techniques can be utilized to further predict and interpret the acquired molecular phenotypes. In recent studies, deep learning has been shown to yield significant results in plant genomics and crop breeding research. However, a complete review of deep learning applications in plant genomics is lacking. [Method/Process] The input to deep learning applied to genomics is usually biological sequences and molecular phenotypes as predictor and target variables, respectively. We introduced the workflow from four views: input data pre-processing includes retrieval, coding, and splitting; model construction and training includes the selection of model architecture and hyperparameters; model evaluation and interpretability. Specifically, this paper introduces the background of deep learning approaches, including the latest graph neural networks; then it discusses two prominent issues in the intersection of genomics and deep learning with respect to gene characterization and protein characterization: 1) how to model the flow of information from plant genomic DNA sequences to molecular phenotypes; and 2) how deep learning models can be utilized to identify functional variation in natural populations? Specifically, the paper summarizes the current status of deep learning applications in related fields, which include deep learning and DNA and gene characterization research, interpretability of deep learning in genomics applications, graph neural networks in genomics, deep learning and genomic variation research, deep learning in protein prediction, ALPHAFOLD in protein prediction, deep learning and crop breeding research, and unsupervised learning in genomics and protein characterization. [Results/Conclusions] This article summarizes how traditional deep-learning algorithms, graph deep-learning, generative adversarial networks and interpretable AI are applied in current research in order to address these two problems. Finally, the prospects for deep learning in future plant genomics research and crop improvement are discussed. Overall, deep learning has provided better results than conventional methods in many genomics research directions, and the application of deep learning in genomics has yielded early applications of scientific and economic significance. Deep learning offers two distinct advantages: 1) end-to-end learning, with the ability to integrate multiple pre-processing steps into a single model; and 2) multimodal data processing capabilities that can handle extremely heterogeneous data in genomics. The advancement of deep learning has the potential to expand new research perspectives in genomics and crop breeding, and to facilitate larger-scale association studies in both phenotypic and genotypic genomics as algorithms become more accurate.

Key words: plant genomics, crop breeding, deep learning, graph deep learning, review

CLC Number: 

  • S-1
[1] CRICK F.Central dogma of molecular biology[J]. Nature, 1970, 227(5258): 561-563.
[2] WAINBERG M, SINNOTT-ARMSTRONG N, MANCUSO N, et al.Opportunities and challenges for transcriptome-wide association studies[J]. Nature genetics, 2019, 51(4): 592-599.
[3] ERASLAN G, AVSEC Z, GAGNEUR J.Theis FJ deep learning: New computational modelling techniques for genomics[J]. Nature reviews genetics, 2019, 20(7): 389-403.
[4] XU C, JACKSON S A.Machine learning and complex biological data[J]. Genome biology, 2019, 20(1): 1-4.
[5] LAI X, STIGLIANI A, VACHON G, et al.Building transcription factor binding site models to understand gene regulation in plants[J]. Molecular plant, 2019, 12(6): 743-763.
[6] ZAMPIERI G, VIJAYAKUMAR S, YANESKE E, et al.Machine and deep learning meet genome-scale metabolic modeling[J]. PLoS computational biology, 2019, 15(7): E1007084.
[7] WANG H, CIMEN E, SINGH N, et al.Deep learning for plant genomics and crop improvement[J]. Current opinion in plant biology, 2020, 54: 34-41.
[8] DELONG A, WEIRAUCH M T, et al.Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning[J]. Nature biotechnology, 2015, 33(8): 831-838.
[9] ZHOU J, TROYANSKAYA O G.Predicting effects of noncoding variants with deep learning -Based sequence model[J]. Nature methods, 2015, 12(10): 931-934.
[10] CHING T, HIMMELSTEIN D S, BEAULIEU-JONES B K, et al. Opportunities and obstacles for deep learning in biology and medicine[J]. Journal of the royal society interface, 2018, 15(141): 20170387.
[11] WANG M, TAI C, E W, et al. DeFine: Deep convolutional neural networks accurately quantify intensities of transcription factor - DNA binding and facilitate evaluation of functional non-coding variants[J]. Nucleic acids research, 2018, 46(11): E69-E69.
[12] GREENSIDE P, SHIMKO T, FORDYCE P, et al.Discovering epistatic feature interactions from neural network models of regulatory DNA sequences[J]. Bioinformatics, 2018, 34(17): i629-i637.
[13] QIN Q, FENG J.Imputation for transcription factor binding predictions based on deep learning[J]. PLoS computational biology, 2017, 13(2): E1005403
[14] KELLEY D R, RESHEF Y A, BILESCHI M, et al.Sequential regulatory activity prediction across chromosomes with convolutional neural networks[J]. Genome research, 2018, 28(5): 739-750.
[15] SCHREIBER J, LIBBRECHT M, BILMES J, et al.Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture[J]. BioRxiv, 2017: 103614.
[16] ZENG H, GIFFORD D K.Predicting the impact of non-coding variants on DNA methylation[J]. Nucleic acids research, 2017, 45(11): E99.
[17] ANGERMUELLER C, LEE H J, REIK W, et al.DeepCpG: Accurate prediction of single-cell DNA methylation states using deep learning[J]. Genome biology, 2017, 18(1): 1-13.
[18] ZHOU J, THEESFELD C L, YAO K, et al.Deep learning sequence - Based AB initio prediction of variant effects on expression and disease risk[J]. Nature genetics, 2018, 50(8): 1171-1179.
[19] PAN X, SHEN H B.RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach[J]. BMC bioinformatics, 2017, 18(1): 1-14.
[20] KIM H K, MIN S, SONG M, et al.Deep learning improves predic-tion of CRISPR-Cpf1 guide RNA activity[J]. Nature biotechnology, 2018, 36(3): 239-241.
[21] ZHANG Y, AN L, XU J, et al.Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus[J]. Nature communica-tions, 2018, 9(1): 1-9.
[22] LUO R, SEDLAZECK F J, LAM T W, et al.Clairvoyante: A multi-task convolutional deep neural network for variant calling in single molecule sequencing[J]. BioRxiv, 2018: 310458.
[23] PAN X, RIJNBEEK P, YAN J, et al.Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks[J]. BMC genomics, 2018, 19(1): 1-11.
[24] QUANG D, XIE X.DanQ: A hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences[J]. Nucleic acids research, 2016, 44(11): E107.
[25] LEE B, BAEK J, PARK S, et al.DeepTarget: End-to-end learning framework for microRNA target prediction using deep recurrent neural networks[C]. Proceedings of the 7th ACM international conference on bioinformatics, computational biology, and health informatics, 2016: 434-442.
[26] PARK S, MIN S, CHOI H, et al. DeepMiRGene: Deep neural network based precursor microrna prediction[J]. ArXiv preprint arxiv:1605.00017, 2016.
[27] SHEN Z, BAO W, HUANG D S.Recurrent neural network for predicting transcription factor binding sites[J]. Scientific reports, 2018, 8(1): 1-10.
[28] KELLEY D R, SNOEK J, RINN J L.Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks[J]. Genome research, 2016, 26(7): 990-999.
[29] SIMONYAN K, VEDALDI A, ZISSERMAN A. Deep inside convolutional networks: Visualising image classification models and saliency maps[J]. ArXiv preprint arxiv:1312.6034, 2013.
[30] SHRIKUMAR A, GREENSIDE P, SHCHERBINA A, et al. Not just a black box: Learning important features through propagating activation differences[J]. ArXiv preprint arxiv:1605.01713, 2016.
[31] SUNDARARAJAN M, TALY A, YAN Q.Axiomatic attribution for deep networks[C]. International conference on machine learning, PMLR, 2017: 3319-3328.
[32] MA J, YU M K, FONG S, et al.Using deep learning to model the hierarchical structure and function of a cell[J]. Nature methods, 2018, 15(4): 290-298.
[33] ZITNIK M, LESKOVEC J.Predicting multicellular function through multi-layer tissue networks[J]. Bioinformatics, 2017, 33(14): i190-i198.
[34] ZITNIK M, AGRAWAL M, LESKOVEC J.Modeling polypharmacy side effects with graph convolutional networks[J]. Bioinformatics, 2018, 34(13): i457-i466.
[35] KEARNES S, MCCLOSKEY K, BERNDL M, et al.Molecular graph convolutions: Moving beyond fingerprints[J]. Journal of computer-aided molecular design, 2016, 30(8): 595-608.
[36] DUTIL F, COHEN J P, WEISS M, et al. Towards gene expression convolutions using gene interaction graphs[J]. ArXiv preprint arxiv:1806.06975, 2018.
[37] KELLEY D R.Cross-species regulatory sequence activity prediction[J]. PLoS computational biology, 2020, 16(7): E1008050.
[38] ZHANG Z, PARK C Y, THEESFELD C L, et al.An automated framework for efficiently designing deep convolutional neural networks in genomics[J]. Nature machine intelligence, 2021, 3(5): 392-400.
[39] TRAN N H, ZHANG X, XIN L, et al.De novo peptide sequencing by deep learning[J]. Proceedings of the national academy of sci-ences, 2017, 114(31): 8247-8252.
[40] YANG H, CHI H, ZENG W F, et al.PNovo 3: Precise de novo peptide sequencing using a learning-to-rank framework[J]. Bioinformatics, 2019, 35(14): i183-i190.
[41] GESSULAT S, SCHMIDT T, ZOLG D P, et al.Prosit: Proteome-wide prediction of peptide tandem mass spectra by deep learning[J]. Nature methods, 2019, 16(6): 509-518.
[42] MIRABELLO C, WALLNER B.RawMSA: Proper deep learning makes protein sequence profiles and feature extraction obsolete[J]. Biorxiv, 2018: 394437.
[43] HASHEMIFAR S, NEYSHABUR B, KHAN A A, et al.Predicting protein - Protein interactions through sequence-based deep learning[J]. Bioinformatics, 2018, 34(17): i802-i810.
[44] ZHANG D, KABUKA M.Multimodal deep representation learning for protein interaction identification and protein family classification[J]. BMC bioinformatics, 2019, 20(16): 1-14.
[45] LONGWELL S, SHIMKO T.Res2Vec: Amino acid vector embeddings from 3D-protein structure[J]. THRESHOLD, 30(22): 344.
[46] XU W, GAO Y, WANG Y, et al.Protein╞protein interaction prediction based on ordinal regression and recurrent convolutional neural networks[J]. BMC bioinformatics, 2021, 22(6): 1-21.
[47] SENIOR A W, EVANS R, JUMPER J, et al.Improved protein structure prediction using potentials from deep learning[J]. Nature, 2020, 577(7792): 706-710.
[48] JUMPER J, EVANS R, PRITZEL A, et al.Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589.
[49] BAEK M, DIMAIO F, ANISHCHENKO I, et al.Accurate prediction of protein structures and interactions using a three-track neural network[J]. Science, 2021, 373(6557): 871-876.
[50] TUNYASUVUNAKOOL K, ADLER J, WU Z, et al.Highly accurate protein structure prediction for the human proteome[J]. Nature, 2021, 596(7873): 590-596.
[51] CHOWDHURY R, BOUATTA N, BISWAS S, et al.Single-sequence protein structure prediction using language models from deep learning[J]. BioRxiv, 2021.
[52] RODRíGUEZ-LEAL D, LEMMON Z H, MAN J, et al. Engineering quantitative trait variation for crop improvement by genome editing[J]. Cell, 2017, 171(2): 470-480, e8.
[53] GUPTA A, ZOU J. Feedback GAN (FBGAN) for DNA: A novel feedback-loop architecture for optimizing protein functions[J]. ArXiv preprint arxiv:1804.01694, 2018.
[54] LOPEZ R, REGIER J, COLE M B, et al.Deep generative modeling for single-cell transcriptomics[J]. Nature methods, 2018, 15(12): 1053-1058.
[55] M R, BEYAN O, ZAPPA A, et al. Deep learning-based clustering approaches for bioinformatics[J]. Briefings in bioinformatics, 2021, 22(1): 393-415.
[56] XIE R, WEN J, QUITADAMO A, et al.A deep auto-encoder model for gene expression prediction[J]. BMC genomics, 2017, 18(9): 39-49.
[57] DINCER A B, JANIZEK J D, LEE S I.Adversarial deconfounding autoencoder for learning robust gene expression embeddings[J]. Bioinformatics, 2020, 36(2): i573.
[58] KILLORAN N, LEE L J, DELONG A, et al. Generating and designing DNA with deep generative models[J]. ArXiv preprint arxiv:1712.06148, 2017.
[59] GHAHRAMANI A, WATT F M, LUSCOMBE N M.Generative adversarial networks simulate gene expression and predict perturbations in single cells[J]. BioRxiv, 2018: 262501.
[1] SHI Yunlai, CUI Yunpeng, DU Zhigang. A Classification Method of Agricultural News Text Based on BERT and Deep Active Learning [J]. Journal of Library and Information Science in Agriculture, 2022, 34(8): 19-29.
[2] HUANG Yichun. Research on the Development Trend of Crop Breeding in China from the Supply of Innovative Elements [J]. Journal of Library and Information Science in Agriculture, 2022, 34(5): 31-46.
[3] MAO Jin, CHEN Ziyang. A Deep Learning Based Approach to Structural Function Recognition of Scientific Literature Abstracts [J]. Journal of Library and Information Science in Agriculture, 2022, 34(3): 15-27.
[4] WAN Hao, ZHANG Fujun, LV Qianqian. The Validity of Peer Review Results of DEA Based Super Efficiency Projects [J]. Journal of Library and Information Science in Agriculture, 2022, 34(2): 88-101.
[5] ZHENG Guangchun, ZHANG Yuhao, YAN Hui. From Backburner to the Forefront: Review of the Application of Autoethnography in the Information Science Field [J]. Journal of Library and Information Science in Agriculture, 2022, 34(2): 40-47.
[6] ZHANG Min. Overseas Copyright Literacy Research: Origins, Progress and Advances [J]. Journal of Library and Information Science in Agriculture, 2021, 33(9): 83-92.
[7] ZHOU Zhenguo. Data Governance Research from the Perspective of Governance Framework [J]. Journal of Library and Information Science in Agriculture, 2020, 32(7): 57-62.
[8] LYU Lucheng, HAN Tao. Artificial Intelligence Empowers Library and Information Service ——Review of Forums about Information Technology for Library 2019 [J]. Journal of Library and Information Science in Agriculture, 2020, 32(5): 13-18.
[9] LIN Hai, GU Tinghua, WU Yubing. Development Context and Characteristics of Social Commerce: Review and Prospect Based on Visualization Technology [J]. Journal of Library and Information Science in Agriculture, 2020, 32(5): 31-44.
[10] CHAI Miaoling, HUANG Lin, REN Yunyue. A Review of Construction of Major Agricultural Open Scientific Data Resources [J]. Journal of Library and Information Science in Agriculture, 2020, 32(10): 25-34.
[11] CHAI Jiaqi, CHEN Shiji. Review on Paper Novelty Measurement [J]. Journal of Library and Information Science in Agriculture, 2020, 32(10): 56-61.
[12] WANG Xuejing. Research on Intelligent Service Mode of Digital Library Based on Deep Learning Technology [J]. , 2018, 30(9): 150-153.
[13] ZHAO Limei, TAO Yi. Empirical Study on Impact Factors of Online Reviews Usefulness [J]. , 2018, 30(7): 14-17.
[14] HUO Zhenxiang, QU Lichun, LI Xiaoping. Resisting Measures and Thinking to Academic Misconduct from the Perspective of Scientific Journal Editor [J]. , 2018, 30(7): 137-140.
[15] Zhou Rong, Yu Dengke, Liu Xianqiu. Literature Review of Knowledge Sharing and Collaborative Decision-making in Demand Supply Chain of Agricultural Products [J]. , 2018, 30(2): 51-58.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!