Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (4): 29-34    DOI: 10.11925/infotech.1003-3513.2008.04.06
Current Issue | Archive | Adv Search |
An Algorithm of Text Information Filtering Based on Feature Extraction
Yang Zhizhuo  Han Xie
(School of Electronics and Computer Science and Technology, North University of China,Taiyuan 030051,China)
Download: PDF (522 KB)  
Export: BibTeX | EndNote (RIS)      

 In order to resolve the disadvantages of traditional TFIDF in text filtering, the authors propose a text information filtering algorithm based on feature extraction. This paper briefly analyses the text information filtering principles and processes, and then focuses on the design and realization of information filtering algorithm. Experimental results show that the new approach significantly outperforms the traditional information filtering method.

Key wordsInformation extraction      Information filtering      Text feature extraction      TF-IDF     
Received: 19 November 2007      Published: 25 April 2008


Corresponding Authors: Yang Zhizhuo     E-mail:
About author:: Yang Zhizhuo,Han Xie

Cite this article:

Yang Zhizhuo,Han Xie. An Algorithm of Text Information Filtering Based on Feature Extraction. New Technology of Library and Information Service, 2008, 24(4): 29-34.

URL:     OR

[1] Wang H, Li S, Yu S, et al. A Combining Approach to Automatic Keyphrases Indexing for Chinese News Documents[C]. In: A. Gelbukh (Ed.)Computational Linguistics and Intelligent Text Processing (CICLing-2004), Lecture Notes in Computer Science,  Springer-Verlag, 2004,2945:435-438.
[2]  Li S, Wang H, Yu S, et al. News-Oriented Automatic Chinese Keyword Indexing[C]. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing,  2003: 92-97.
[3] Stevens M E. Automatic Indexing: A StateoftheArt Report[R]. Washington, D.C:Government Printing Office, 1970.
[4]  Chien L F. PATTreeBased Keyword Extraction for Chinese Information Retrieval[C]. In:Proceedings of the ACM SIGIR International Conference on Information Retrieval, 1997:50-59.
[5] Turney P D.  Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000,2(4):303-336.
[6] 王永成,顾晓明,王丽霞.中文文献主题的自动标引[J].情报学报,1998, 17(3): 212-217.
[7] 张玉叶,李连,刘海见,等.文本过滤中的特征抽取应用研究[J].海军航空工程学院学报, 2005,20(1):139-142.

[1] Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[2] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[3] Chengzhi Zhang,Zheng Li. Extracting Sentences of Research Originality from Full Text Academic Articles[J]. 数据分析与知识发现, 2019, 3(10): 12-18.
[4] Mu Dongmei,Jin Shan,Ju Yuanhong. Finding Association Between Diseases and Genes from Literature Abstracts[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[5] Yin Cong,Zhang Liyi. Recommendation Algorithm for Post-Context Filtering Based on TF-IDF: Case Study of Catering O2O[J]. 数据分析与知识发现, 2018, 2(11): 28-36.
[6] Li Changbing,Pang Chongpeng,Li Meiping. Extracting Product Features with Weight-based Apriori Algorithm[J]. 数据分析与知识发现, 2017, 1(9): 83-89.
[7] He Yue,Xiao Min,Zhang Yue. Sentiment Analysis of Trending Topics Based on Relevance[J]. 数据分析与知识发现, 2017, 1(3): 46-53.
[8] Yufeng Duan,Sisi Huang. Information Extraction from Chinese Plant Species Diversity Description Text[J]. 现代图书情报技术, 2016, 32(1): 87-96.
[9] Liu Wei, Wang Xing, Song Peiyan. A Noise Cleaning Method for Synonym Extraction Results[J]. 现代图书情报技术, 2015, 31(6): 64-70.
[10] Xu Dongdong, Wu Shaobo. An Improved TF-IDF Feature Selection Based on Categorical Description[J]. 现代图书情报技术, 2015, 31(3): 39-48.
[11] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[12] Li Xiangdong, Huo Yayong, Huang Li. Study of Book Pages Automatic Identification and Bibliographic Information Extraction[J]. 现代图书情报技术, 2014, 30(4): 71-77.
[13] Liu Yajing, Wang Yanxi, Hao Dan, Zhou Jinhui. Study on the Methods of Institutional Repository Supporting Research Services[J]. 现代图书情报技术, 2014, 30(3): 1-7.
[14] Zhang Han, Liu Shuangmei. Comparative Analysis of Centrality Indices in Extracting Concepts from Semantic Predication Network——Based on Disease Treatment Research[J]. 现代图书情报技术, 2013, (6): 30-35.
[15] Lu Yonghe, Li Yanfeng. A Feature Selection Based on Consideration of Multiple Factors[J]. 现代图书情报技术, 2013, (5): 34-39.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938