农业图书情报学刊 ›› 2016, Vol. 28 ›› Issue (11): 50-53.doi: 10.13998/j.cnki.issn1002-1248.2016.11.012

• 文献研究 • 上一篇    下一篇

基于随机森林的文本分类模型研究

罗新   

  1. 华南理工大学工商管理学院,广东 广州 510640
  • 收稿日期:2016-05-17 出版日期:2016-11-05 发布日期:2016-11-08
  • 作者简介:罗新(1984-),女,馆员,硕士,研究方向:文本分类,自然语言处理。
  • 基金资助:
    面向文本分类的多学科协同建模理论与实验研究(项目编号:71373291)

Research on text Classification Model Based on Random Forests

LUO Xin   

  1. School of Business Administration, South China University of Technology, Guangdong Guangzhou 510640, China
  • Received:2016-05-17 Online:2016-11-05 Published:2016-11-08

摘要: 文本分类作为处理大量文本数据的关键技术,可以在较大程度上解决“信息爆炸”所带来的问题。Breiman提出的随机森林算法具有良好的泛化性和鲁棒性、对噪声不敏感、能处理连续属性的特点,很适合用来建立文本分类模型。笔者将随机森林算法尝试性引入文本分类领域,构建基于随机森林的文本分类模型,并在标准文本测试集Reuters-21578进行测试和比较,结果表明:(1)该模型可以较好地应用于文本分类;(2)与基于CART、REPTree和J48的文本分类模型的结果相比较,基于随机森林的文本分类模型的效果最好,F1-Measure达到了0.777;(3)基于随机森林的文本分类模型操作方便、直观有效、评价结果可靠,为文本分类研究提供了新思路。

关键词: 文本分类, 随机森林, CART树

Abstract: Text classification is the key technology for processing large amount of text data. It can solve the information explosion problem in a certain extent. Random forests algorithm proposed by Breiman has the characteristics of good generalization and robustness, insensitivity for noise and ability in dealing with continuous attributes, which is very suitable for the establishment of text classification model. This paper attempted to construct the text classification model based on random forests algorithm, and compared with the text categorization model Reuters-21578 to verify the model's validity and accuracy for classification. Results showed: this model could be applied in text classification well; compared with the results of CART, REPTree and J48it models, it had the best effect, whose F1-Measure was 0.777; it had easy, intuitive and effective operation, and reliable results, which provided new idea for text classification research.

Key words: Random forests

中图分类号: 

  • TP391

引用本文

罗新. 基于随机森林的文本分类模型研究[J]. 农业图书情报学刊, 2016, 28(11): 50-53.

LUO Xin. Research on text Classification Model Based on Random Forests[J]. , 2016, 28(11): 50-53.