This paper presents a new method of automatic indexing and retrieval.The approach is to take advantage of terms with documents (“latent semantic-structure”)in order to improve the detection of relevent documents on the basis of terms found in queries.A particular technique used is singular-value decomposition in which a large term-document matrix is decomposed into a set of korthogonal factors.The original matrix can be approximated by linear combination from the factors set.Documents and queries are represented as vectors for med from weighted combinations of these factors. The relevancy prediction is achieved by comput ing the similarity of query and documents.
收稿日期: 1998-03-30
出版日期: 1998-08-25
通讯作者:
冯项云
作者简介: 冯项云
引用本文:
冯项云. LSI潜在语义标引方法在情报检索中的应用[J]. 现代图书情报技术, 1998, 14(4): 19-21.
Feng Xiangyun. Applying Latent Semantic Indexing to Information Retrieval System. New Technology of Library and Information Service, 1998, 14(4): 19-21.
1 苏新宁.汉语文献自动标引综析.情报学报,1993,12(4):309-318
2 陈光祚.论单汉字检索系统.情报学报,1992,11(1):11-18
3 张永奎.聚类分析在自然语言处理中的应用.情报学报,1994,12(5):352-358
4 Caid,W.R.,Dumais,S.T.,&Galltant,S,I.(1995).Learned vector-space models for document retrieval.Information Processing&Mangement,31(3),419-429.
5 Cullum,J.K.and Willoughby,R.A.Lanczos algorithms for large symmetric eigenvalue computations-voll Theory,(Chaper 5:Real rectangular matrices).Brikhaser,Boston,1985.
6 Deerwester,S.,Dumais,S.T.,Landauer,T.K.,Furnas,G,W.,&Harshman,R.A.(1990).Indexing by latent semantic analysis.Journal of the Society for Information Science,41(6),391-407.
7 Dumais,S.T.(1993).LSI meets TREC:A status report.InD.Harman(Ed.),The first Text Retrieval Conference
(TREC-1).NIST Special Publication 500-207,137-152.
8 Salton,G.&McGill,M.J.(1983).Introduction to modern informationret rieval.New York:McGraw-Hill.