中文    English

Journal of Library and Information Science in Agriculture ›› 2021, Vol. 33 ›› Issue (1): 4-16.doi: 10.13998/j.cnki.issn1002-1248.20-1197

;

• Special manuscript •     Next Articles

Review on the Application and Development Strategies of Text Mining in Agriculture Knowledge Services

SUN Tan1, DING Pei2, HUANG Yongwen3,*, XIAN Guojian3   

  1. 1. Chinese Academy of Agricultural Sciences, Beijing 100081;
    2. Shenzhen University Library, Shenzhen 518060;
    3. Agricultural Information Institute of CAAS, Beijing 100081
  • Received:2020-11-20 Online:2021-01-05 Published:2021-02-05

Abstract: [Purpose/Significance] Under the new ecological environment of scientific and technological innovation supporting data-intensive scientific discovery, the new format of knowledge service is quietly taking shape. Text mining as the core of knowledge service is facing challenges under the environment of new knowledge service formats. This paper aims to discuss the development strategies of using text mining to carry out knowledge services in the new environment. [Method/Process]This paper sorts out the technical framework of text mining, and demonstrates that text mining is gradually maturing. Using the research and practice in the field of agriculture as a case study in such areas as information retrieval, intelligent question-answering, information monitoring and knowledge extraction, text mining has shown a good performance in scientific and technological innovation and industrial applications. [Results/Conclusions] This paper puts forward the knowledge service technology development strategies according to China's conditions: (1) constructing a specialized knowledge service system based on text mining technologies, (2) attaching importance to the construction of corpora and basic knowledge bases, and (3) giving priority to implementing the deployment of knowledge service technologies in key areas.

Key words: text mining, knowledge services, information extraction, knowledge organization

CLC Number: 

  • G302
[1] 谌志群, 张国煊. 文本挖掘研究进展[J]. 模式识别与人工智能, 2005, 18(1): 65-74.
CHEN Z Q, ZHANG G X.A survey of text mining[J]. Pattern recognition and artificial intelligence, 2005, 18(1): 65-74.
[2] ALLAHYARI M, POURIYEH S, ASSEFI M, et al.A brief survey of text mining: Classification, clustering and extraction techniques[C]//KDD Bigdas, Halifax, Canada, 2017.
[3] 化柏林. 数据挖掘与知识发现关系探析[J]. 情报理论与实践, 2008, 31(4): 507-510.
HUA B L.Probe into the relationship between data mining and knowledge discovery[J]. Information studies: theory & application, 2008,31(4): 507-510.
[4] USAMA F, GREGORY P-S, PADHRAIC S, et al.Knowledge discovery and data mining: Towards a unifying framework[C]//KDD'96: Proceedings of the second international conference on knowledge discovery and data mining, 1996: 82-88.
[5] FRAWLEY W J, PIATETSKY-SHAPIRO G, MATHEUS C J.Knowledge discovery in databases: An overview[J]. AI magazine, 1992, 13(3): 57-70.
[6] DRURY B M, ROCHE M.A survey of the applications of text mining for agriculture[J]. Computers and electronics in agriculture, 2019, 163: 104864.
[7] KUMAR B S, RAVI V.A survey of the applications of text mining in financial domain[J]. Knowledge based systems, 2016, 114(15):128-147.
[8] FELDMAN R, DAGAN I, HIRSH H.Mining text using keyword distributions[J]. Journal of intelligent information systems, 1998, 10(3): 281-300.
[9] TAN A H.Text mining: the state of the art and challenges[J]. Proceedings of the PAKDD workshop on knowledge discovery from advanced databases, 1999: 65-70.
[10] 周雪忠, 吴朝晖. 文本知识发现: 基于信息抽取的文本挖掘[J]. 计算机科学, 2003, 30(1): 63-66.
ZHOU X Z, WU C H.Knowledge discovery in text: A survey[J]. Computer science, 2003, 30(1): 63-66.
[11] SHILPA D, PEERZADA H A.Text mining: Techniques and its application[J/OL]. International journal of engineering & technology innovations, 2014: 22-25.
[12] INZALKAR S M, SHARMA J.A survey on text mining-techniques and application[J/OL]. International journal of research in science & engineering, 2014: 488-495.
[13] CHIBELUSHI C, SHARP B, SALTER A.A text mining approach to tracking elements of decision making: a pilot study[C]//International workshop on natural language understanding & cognitive science, DBLP, 2004.
[14] VISHAL G, LEHAL G S.A survey of text mining techniques and applications[J]. Journal of emerging technologies in web intelligence, 2009, 1(1): 60-76.
[15] 薛为民, 陆玉昌. 文本挖掘技术研究[J]. 北京联合大学学报(自然科学版), 2005, 19(4): 59-63.
XUE W M, LU Y C.Research on text data mining[J]. Journal of Beijing union university (natural sciences), 2005, 19(4): 59-63.
[16] 胡静, 蒋外文, 朱华. Web文本挖掘中数据预处理技术研究[J]. 现代计算机(专业版), 3: 48-51.
HU J, JIANG W W, ZHU H.Research on data preprocessing techniques in web text mining[J]. Modern computer, 3: 48-51.
[17] Manuel M-Y-G, Gelbukh A, Aurelio L-L. Text mining at detail level using conceptual graphs[M]// Conceptual structures: Integration and interfaces, Springer Berlin Heidelberg, 2002.
[18] BING L, JIANG S, LAM W, et al.Adaptive concept resolution for document representation and its applications in text mining[J]. Knowledge-Based systems, 2015, 74(1): 1-13.
[19] ARMSTRONG R.WebWacher: a learning apprentice for the world wide web[C]//AAAI spring symposium on information gathering from heterogeneous, distributed environments, 1995.
[20] MLADENIC D, GROBELNIK M.Feature selection for unbalanced class distribution and naive bayes[C]//Proceedings of the sixteenth international conference on machine learning (ICML 1999), Bled, Slovenia, DBLP, 1999.
[21] JOLLIFFE I T.Principal component analysis[J]. Journal of marketing research, 2002, 87(4): 513.
[22] MARTINEZ A M, KAK A C.PCA versus LDA[J]. IEEE transactions on pattern analysis & machine intelligence, 2002, 23(2): 228-233.
[23] DEERWESTER S, DUMAIS S T, FURNAS G W, et al.Indexing by latent semantic analysis[J]. Journal of the association for information ENCE & technology, 2010, 41(6): 391-407.
[24] HOFMANN T.Probabilistic latent semantic indexing[C]//International ACM SIGIR conference on research & development in information retrieval, ACM, 1999.
[25] BLEI D M, NG A Y, JORDAN M I.Latent Dirichlet allocation[J]. Journal of machine learning research, 2003, 3: 993-1022.
[26] MANNING C D, RAGHAVAN P, HINRICH S.Introduction to information retrieval[M]. 北京: 人民邮电出版社, 2010.
MANNING C D, RAGHAVAN P, HINRICH S.Introduction to information retrieval[M]. Beijing: Posts & telecom press, 2010.
[27] TURTLE H R, CROFT W B.Inference networks for document retrieval[C]//SIGIR'90, 13th international conference on research and development in information retrieval, Brussels, Belgium, Proceedings, ACM, PUB27, New York, USA, 1990.
[28] WANG D, ZHANG H, LIU R, et al.Unsupervised feature selection through Gram-Schmidt orthogonalization-a word co-occurrence perspective[J]. Neurocomputing, 2016, 173(3):845-854.
[29] YANG Y, PEDERSEN J O.A comparative study on feature selection in text categorization[C]//Fourteenth international conference on machine learning, 1997: 412-420.
[30] BENABDESLENI K, ELGHAZEL H, HINDAWI M.Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection[J]. Knowledge and information systems, 2016, 49(3): 1161-1185.
[31] VILA M, BARDERA A, FEIXAS M, et al.Tsallis mutual information for document classification[J]. Entropy, 2011, 13(9): 1694-1707.
[32] MIKOLOV T, CHEN K, CORRADO G, et al.Efficient estimation of word representations in vector space[J]. Computer science, 2013.
[33] MIKOLOV T, SUTSKEVER I, CHEN K, et al.Distributed representations of words and phrases and their compositionality[J]. Advances in neural information processing systems, 2013: 3111-3119.
[34] QUOC V L, MIKOLOV T.Distributed representations of sentences and documents[J]. Computer science, 2014.
[35] KIROS R, ZHU Y, SALAKHUTDINOV R R, et al.Skip-Thought vectors[J]. Advances in neural information processing systems, 2015, 28.
[36] TANG D, QIN B, LIU T.Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of the 2015 conference on empirical methods in natural language processing, 2015.
[37] YANG Z, YANG D, DYER C, et al.Hierarchical attention networks for document classification[C]//Proceedings of the 2016 conference of the north American chapter of the association for computational linguistics: Human language technologies, 2016: 1480-1489.
[38] PETER W.Recent trends in hierarchic document clustering: A critical review[J]. Information processing & management, 1988, 24(5): 577-597.
[39] KANUNGO T, MOUNT D M, NETANYAHU N S, et al.An efficient k-means clustering algorithm: Analysis and implementation[J]. IEEE transactions on pattern analysis & machine intelligence, 2002, 24(7): 881-892.
[40] ALLAHYARI M, KOCHUT K.Automatic topic labeling using Ontology-Based topic models[C]//IEEE international conference on machine learning & applications, IEEE, 2016.
[41] ALLAHYARI M, KOCHUT K.Semantic Context-Aware recommendation via topic models leveraging linked open data[C]//In international conference on web information systems engineering, Springer, 2016: 263-277.
[42] PRITHVIRAJ S.Collective context-aware topic models for entity disambiguation[C]//In proceedings of the 21st international conference on world wide Web, ACM, 2012: 729-738.
[43] AKHONDI S A, HETTNE K M, EELKE VAN D H. Recognition of chemical entities: Combining dictionary-based and grammar-based approaches[J]. Journal of cheminformatics, 2015, 7(1).
[44] CIRAVEGNA F, DINGLI A, IRIA J, et al.Multi-Strategy definition of annotation services in MELITA[C]//ISWC 2003 international semantic web conference, 2003: 97-107.
[45] CIRAVEGNA F, CHAPMAN S, DINGLI A, et al.Learning to harvest information for the semantic web[C]//The semantic web: Research and applications, Springer Berlin Heidelberg, 2004: 312-326.
[46] VARGAS-VERA M, MOTTA E, DOMINGUE J, et al.MnM: Ontology driven semi-automatic and automatic support for semantic markup[C]//International conference on knowledge engineering and knowledge management, Springer Berlin Heidelberg, 2002.
[47] CIRAVEGNA F, DINGLI A, PETRELLI D, et al.User-System cooperation in document annotation based on information extraction[C]//Knowledge engineering and knowledge management, ontologies and the semantic web, Siguenza, Spain, 2002.
[48] MANABU T, ZHANGZHI H, WU C H, et al.BioTagger-GM: A gene/protein name recognition system[J]. Journal of the American medical informatics association, 2009(2): 247-255.
[49] 丁培. 科学文献与科学数据细粒度语义关联研究[J]. 图书馆论坛, 2016(7): 24-33.
DING P.Semantic association between scientific data and scientific literature at fine-grained level[J]. Library tribune, 2016(7): 24-33.
[50] RINDFLESCH T C, PHD L H, ARONSON A R.Mining molecular binding terminology from biomedical text[C]//Proceedings of the AMIA99 annual symposium, 1999.
[51] PUSTEJOVSKY J, CASTAFIO J, ZHANG J, et al.Robust relational parsing over biomedical literature: Extracting inhibit relations[J]. Pacific symposium on biocomputing pacific symposium on biocomputing, 2002, 16(9): 362-373.
[52] ONO T, HISHIGAKI H, TANIGAMI A, et al.Automatic extraction of information on protein-protein interactions from the biological literature[J]. Bioinformatics, 2001, 17(2): 155-161.
[53] PARK J C, KIM H S, KIM J J.Bidirectional incremental parsing for automatic pathway identification with combinatory categorical grammar[C]//Proceedings of the pacific symposium on bio computing, Hawaii, USA, 2001: 396-407.
[54] TEMKIN J M, GILDER M R.Extraction of protein interaction information from unstructured text using a context-free grammar[J]. Bioinformatics, 2003, 19(16): 2046-2053.
[55] SemRep[EB/OL]. [2020-05-21]. http://semrep.nlm.nih.gov/.
[56] STAPLEY B, BENOIT G.Biobibliometrics: Information retrieval and visualization from co-occurrences of gene names in Medline abstracts[C]//Proceedings of the pacific symposium on biocomputing, Hawaii, USA, 2000: 529-540.
[57] CRAVEN M, KUMLIEN J.Constructing biological knowledge bases by extracting information from text sources[C]//Proceedings of the 7th international conference on intelligent systems for molecular biology. Heidelberg, Germany, 1999: 77-86.
[58] IAN, DONALDSON, JOEL, et al. PreBIND and textomy-mining the biomedical literature for protein-protein interactions using a support vector machine[J]. BMC bioinformatics, 2003: 234-239.
[59] LIU X, BORDES A, GRANDVALET Y.Extracting biomedical events from pairs of text entities[J]. BMC bioinformatics, 2015, 16(10): S8-S8.
[60] 封二英, 牛耘, 魏欧. 基于大规模文本的蛋白质交互关系自动提取[J]. 计算机应用, 2012(32): 147-150.
FENG E Y, NIU Y, WEI O.Extraction of protein-protein interactions by searching large scale text[J]. Journal of computer applications, 2012(32): 147-150.
[61] EISGRUBER L M.Micro-and macro-analytic potential of agricultural information systems[J]. American journal of agricultural economics, 1967, 49.
[62] ARSEVSKA E, VALENTIN S, RABATEL J, et al.Web monitoring of emerging animal infectious diseases integrated in the French animal health epidemic intelligence system[J]. PLoS ONE, 2018, 13(8).
[63] 王婷, 崔运鹏, 王健, 等. 认知计算及其在农业领域的应用研究[J]. 农业图书情报, 2019, 31(4): 4-18.
WANG T, CUI Y P, WANG J, et al.Cognitive computing and applications in agriculture[J]. Agricultural library and information, 2019, 31(4): 4-18.
[64] GAIKWAD S, ASODEKAR R, GADIA S, et al.AGRI-QAS question-answering system for agriculture domain[C]//International conference on advances in computing. IEEE, 2015: 1474-1478.
[65] KAWAMURA T.Question-Answering for agricultural open data[M]. Transactions on Large-Scale data and Knowledge-Centered systems XVI. Springer Berlin Heidelberg, 2014.
[66] ARSEVSKA E, ROCHE M, HENDRIKX P, et al.Identification of terms for detecting early signals of emerging infectious disease outbreaks on the web[J]. Computers & electronics in agriculture, 2016, 123: 104-115.
[67] MALARKODI C S, LEX E, DEVI S L.Named entity recognition for the agricultural domain[J]. Research in computing science, 2016, 117(1): 121-132.
[68] KIM J, CHA M, LEE J G.Nowcasting commodity prices using social media[J]. PeerJ computer ENCE, 2017, 3(262): E126.
[69] SUNANDAN C, ASHWIN V, SRIKANTH J, et al.Predicting socio-economic indicators using news events[C]//Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016: 1455-1464.
[70] LIAO W T, RODRIGUEZ L F, DIESNER J, et al.Improving farm management optimization: Application of text data analysis and semantic networks[C]//2015 ASABE annual international meeting, American society of agricultural and biological engineers, 2015.
[71] CHATTERJEE N, KAUSHIK N, BANSAL B.Inter-Subdomain relation extraction for agriculture domain[J]. IETE technical review, 2019, 36(2): 157-163.
[72] MICHAEL WIEG D K. Towards the detection of reliable food-health relationships[J]. NAACL, 2013.
[73] VALSAMOU D.Information extraction for the seed development regulatory networks of Arabidopsis thaliana[D]. Universite? ParisSaclay, 2017.
[74] 孙坦, 刘峥, 崔运鹏, 等. 融合知识组织与认知计算的新一代开放知识服务架构探析[J]. 中国图书馆学报, 2019, 45(3): 38-48.
SUN T, LIU Z, CUI Y P, et al.Analysis and design of a new generation of open knowledge service system integrating knowledge organization and cognitive computing[J]. Journal of library science in China, 2019, 45(3): 38-48.
[1] SUN Shaodan, DENG Jun, ZHANG Zishu, ZHONG Chuyi, SHENG Panpan. Topic Knowledge Organization of Modern Newspaper Resources by Incorporating the Knowledge Element Concept: Taking the "Shengjing Times" as an Example [J]. Journal of Library and Information Science in Agriculture, 2022, 34(4): 50-62.
[2] WANG Xin, LU Yao, YUAN Xue, ZHAO Wanjing, CHEN Li, LIU Minjuan. A Survey of Author Name Disambiguation Techniques of Academic Papers [J]. Journal of Library and Information Science in Agriculture, 2022, 34(10): 82-90.
[3] WANG Weiwei, HUA Bolin. Extraction and Mining of Intelligent Description Information of Public Culture [J]. Journal of Library and Information Science in Agriculture, 2021, 33(8): 13-23.
[4] XING Yunfei, LI Yuhai. Visualization of Topic Graph of Weibo Public Opinion Based on Text Mining [J]. Journal of Library and Information Science in Agriculture, 2021, 33(7): 12-23.
[5] LI Jihong, CHEN Ninghui, XU Guizhen, JIANG Shan, WANG Hongjiang. A Visualization Analysis of Library, Information and Documentation Science from the Perspective of the National Social Science Fund Programs [J]. Journal of Library and Information Science in Agriculture, 2021, 33(5): 83-92.
[6] ZHANG Yuyao, CHEN Yuanyuan. Discourse Cognition and Construction Based on Text Mining: Taking the White House News Text in the Field of Artificial Intelligence and 5G as an Example [J]. Journal of Library and Information Science in Agriculture, 2021, 33(4): 35-44.
[7] LI Lei, SONG JianNing, SONG TianHua. Technology Forecasting Based on Topic Identification of Online Innovation Communities and S-Curve [J]. Journal of Library and Information Science in Agriculture, 2021, 33(4): 45-57.
[8] CHAI Miaoling, HUANG Lin, REN Yunyue. A Review of Construction of Major Agricultural Open Scientific Data Resources [J]. Journal of Library and Information Science in Agriculture, 2020, 32(10): 25-34.
[9] ZHAO Xueqin, WANG Qingqing. An Investigation into the Travel Information Needs of Online Q&A Platform Users: Taking Tuniu Q&A Community as an Example [J]. Journal of Library and Information Science in Agriculture, 2020, 32(10): 47-55.
[10] SUN Haixia, LI Junlian, HUA Weina, QIAN Qing. Design and Implementation of Network Collaborative Work Platform for Semantic Interoperability of Science and Technology Knowledge Organization Systems [J]. Agricultural Library and Information, 2019, 31(1): 23-34.
[11] CHEN Qingyun, CAO Jianfei, CHEN Rongzhen. Research and Practices From the Thesaurus to Knowledge Graph [J]. Agricultural Library and Information, 2019, 31(1): 44-53.
[12] WEI Xiaoping. Deep Development and Utilization of Digital Ancient Books under the Background of Digital Humanities [J]. , 2018, 30(9): 106-110.
[13] ZHOU Na, LI Xiuxia, GAO Dan, JIAO Hong. Research on Knowledge Combination Analysis Based on Latent Topics—An Example of Communication [J]. , 2018, 30(9): 85-90.
[14] CHEN Demin. Research on Information Discovery Service Model of Digital Library Based on Knowledge Organization [J]. , 2018, 30(4): 185-188.
[15] QIN Feifei, CAO Tao, QIAN Zhiyong. Academic Integrity Knowledge Organization Integrated with Online Teaching of Information Literacy [J]. , 2017, 29(12): 120-126.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!