中文    English

›› 2016, Vol. 28 ›› Issue (11): 50-53.doi: 10.13998/j.cnki.issn1002-1248.2016.11.012

• Literature study • Previous Articles     Next Articles

Research on text Classification Model Based on Random Forests

LUO Xin   

  1. School of Business Administration, South China University of Technology, Guangdong Guangzhou 510640, China
  • Received:2016-05-17 Online:2016-11-05 Published:2016-11-08

Abstract: Text classification is the key technology for processing large amount of text data. It can solve the information explosion problem in a certain extent. Random forests algorithm proposed by Breiman has the characteristics of good generalization and robustness, insensitivity for noise and ability in dealing with continuous attributes, which is very suitable for the establishment of text classification model. This paper attempted to construct the text classification model based on random forests algorithm, and compared with the text categorization model Reuters-21578 to verify the model's validity and accuracy for classification. Results showed: this model could be applied in text classification well; compared with the results of CART, REPTree and J48it models, it had the best effect, whose F1-Measure was 0.777; it had easy, intuitive and effective operation, and reliable results, which provided new idea for text classification research.

Key words: Random forests

CLC Number: 

  • TP391
[1] 刘怀亮, 张治国,马志辉,孙蕾.基于SVM与KNN的中文文本分类比较实证研究[J].情报理论与实践,2008,31(6):941-944.
[2] 杜选.基于加权补集的朴素贝叶斯文本分类算法研究[J].计算机应用与软件,2014,31(9):253-255.
[3] Breiman L. Random Forests[J].Machine Learning,2001,45(1):5-32.
[4] 吴潇雨,和敬涵,张沛,胡骏.基于灰色投影改进随机森林算法的电力系统短期负荷预测[J].电力系统自动化,2015,39(12):50-55.
[5] 杨帆,林琛,周绮凤,符长虹,罗林开.基于随机森林的潜在k近邻算法及其在基因表达数据分类中的应用[J].系统工程理论与实践,2012,32(4):815-825.
[6] 詹曙,姚尧,高贺. 基于随机森林的脑磁共振图像分类[J].电子测量与仪器学报,2013,27(11):1067-1072.
[7] 赖成光,陈晓宏,赵仕威,王兆礼,吴旭树.基于随机森林的洪灾风险评价模型及其应用[J].水利学报,2015,46(1):58-66.
[8] Breiman L, Friedman J, Olshen R, al et. Classification and Regression Trees[M].New York:Chapman&Hall,1984.
[9] Breiman L. Bagging Preditors [J]. Machine Learning,1996,24(2):123-140.
[1] LI Yikai, YE Sa, KOU Yuantao. User Interaction Mode of Agricultural Knowledge Service System [J]. Journal of Library and Information Science in Agriculture, 2022, 34(9): 86-94.
[2] SHI Yunlai, CUI Yunpeng, DU Zhigang. A Classification Method of Agricultural News Text Based on BERT and Deep Active Learning [J]. Journal of Library and Information Science in Agriculture, 2022, 34(8): 19-29.
[3] ZHOU Xiaoying. Application and Practice of Virtual Reality Technology in the Intelligent Reading Promotion of Ancient Books [J]. Journal of Library and Information Science in Agriculture, 2022, 34(8): 79-91.
[4] XU Zhaoyuan, TAI Jun, FAN Lihua. A Comparative Analysis of the Research on the Internet of Things Between CHINA-US Based on Bibliometric [J]. Agricultural Library and Information, 2019, 31(3): 35-47.
[5] LI Lulu, LIANG Zhusen. Design and Realization of Book Crossing System Based on ASP.NET MVC Frame Mode [J]. , 2018, 30(6): 40-42.
[6] LUO Xin. Comparative Study of Chinese Text Classification Model based on Particle Swarm Intelligence [J]. , 2018, 30(4): 18-22.
[7] SUN Sumin. Research on Personalized Library System Based on Distributed Architecture [J]. , 2018, 30(3): 27-31.
[8] CAI Hao-yuan. Analysis of the Latent Semantic Indexing text Mining Method [J]. , 2016, 28(7): 5-9.
[9] YAN Xue, OU yang Haiying, ZENG Shou-ying,GE Chang-shui, TANG Lin, SHAO Ping,CHEN Bai-song. On the Data Acquisition and Cleaning Preparation for Bibliometric Analysis:a case study of the Chinese journal papers of CAFS [J]. , 2014, 26(4): 36-40.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!