|
|
Research on a New Text Automatic Indexing Technology Based on Digital Library |
Wang Lancheng1 Wang Lishuang2 |
1(Department of Information Management, Nanjing Political College PLA, Shanghai 200433, China)
2(Wanfang Data Co., Ltd, Beijing 100044, China) |
|
|
Abstract The semantic environmental with special stop-words location information control has been studied and founded. This technology has been applied to Chinese metadata CXMARC text automatic indexing and the data mining of theme information. The algorithm of SWF that is used in the pretreatment special Chinese text automatic indexing can reduce the participle different meanings of a field efficiently and shorten indexing time. So tradition maximum matching algorithm has been improved of its quality and efficiency.
|
Received: 13 September 2005
Published: 25 February 2005
|
|
Corresponding Authors:
Wang Lancheng
E-mail: wanglancheng@163.com
|
About author:: Wang Lancheng,Wang Lishuang |
1 J.F.Martinez-Trinidad. A Tool To Discover The Main Themes. In A Spanish Or English Document,Expert System With Applications,2000,319-327
2 Wolff J E,et al.. Searching and browsing collections of structural information,In:Proc. of the IEEE Advances in Digital Libraries,2000,141-150
3 W.S.Cooper, A.Chen, F.Gey. Experiments in Probabilistic Retrieval of Full Text Documents, Text Retrieval Conference,Gaithersburg,MD, U.S.A., 1994,127-134
4 SaltonG.. Another look at automatic Text Retrieval systems,Communications of ACM,1986,29(7):236-250
5 Gaston H Gonnet, Ricardo A. Baeza-yates and Tim Sinder. New indices for Text:PAT trees and PAT arrays. Information Retrieval Data Structures & Algorithms, Prentice Hall, 1992
6 Fan Jang-Jong, Su Keh-Yih. An efficient algorithm for match multiple patterns. IEEE Trans on Knowledge and Data Engineering, 1993, 5(2):339-351
7 王兰成等. PLS:一种基于信息自动标引的最小推进分词算法及其实现,计算机科学,2002(增刊):24-26
8 田梅. 档案机读目录XML描述及其主题信息自动标引的研究:[学位论文].上海:南京政治学院上海分院信息管理系,2004 |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|