农业图书情报学刊 ›› 2016, Vol. 28 ›› Issue (7): 5-9.doi: 10.13998/j.cnki.issn1002-1248.2016.07.001

• 信息论坛 •    下一篇

LSI文本挖掘技术剖析

蔡豪源   

  1. 广州图书馆,广东 广州 510623
  • 收稿日期:2016-01-16 出版日期:2016-07-05 发布日期:2016-07-11
  • 作者简介:蔡豪源(1987-),男,助理馆员,广州图书馆馆员。

Analysis of the Latent Semantic Indexing text Mining Method

CAI Hao-yuan   

  1. Guangzhou Library, Guangdong Guangzhou 510623, China
  • Received:2016-01-16 Online:2016-07-05 Published:2016-07-11

摘要: 介绍了LSI潜在语义索引在信息检索领域的运用。阐述了词项加权的3种方法,分析了矩阵的奇异值分解SVD在提取矩阵重要信息方面的作用,展示了对词项—文档矩阵的降秩近似是如何模拟人类理解语义的过程;比较了向量空间模型与LSI在搜索算法上的异同,通过对词项—文档矩阵进行文本挖掘的例子,指出了LSI在分析文档间内在联系所起到的作用。

关键词: 潜在语义索引, 文本挖掘, 向量空间模型, 奇异值分解

Abstract: This paper introduced the application of latent semantic indexing in the field of information retrieval, and presented three ways to calculate the lexical item weighting, and then analyzed the role of Singular Value Decomposition (SVD) in capturing the important information of matrix, and showed how the reduced-rank approximation of item-document matrix simulated the psychological process of human when understanding the meanings of sentences. Through the comparison of the searching algorithm of Vector Space Model (VSM) and LSI, and the case of text mining of a term-document matrix, it indicated how LSI worked in analyzing the connection between documents.

Key words: Latent semantic indexing

中图分类号: 

  • TP391.3

引用本文

蔡豪源. LSI文本挖掘技术剖析[J]. 农业图书情报学刊, 2016, 28(7): 5-9.

CAI Hao-yuan. Analysis of the Latent Semantic Indexing text Mining Method[J]. , 2016, 28(7): 5-9.