Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (4): 29-34    DOI: 10.11925/infotech.1003-3513.2008.04.06
Current Issue | Archive | Adv Search |
An Algorithm of Text Information Filtering Based on Feature Extraction
Yang Zhizhuo  Han Xie
(School of Electronics and Computer Science and Technology, North University of China,Taiyuan 030051,China)
Download: PDF(522 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

 In order to resolve the disadvantages of traditional TFIDF in text filtering, the authors propose a text information filtering algorithm based on feature extraction. This paper briefly analyses the text information filtering principles and processes, and then focuses on the design and realization of information filtering algorithm. Experimental results show that the new approach significantly outperforms the traditional information filtering method.

Key wordsInformation extraction      Information filtering      Text feature extraction      TF-IDF     
Received: 19 November 2007      Published: 25 April 2008
: 

TP391

 
Corresponding Authors: Yang Zhizhuo     E-mail: yangzhizhuo_662@163.com
About author:: Yang Zhizhuo,Han Xie

Cite this article:

Yang Zhizhuo,Han Xie. An Algorithm of Text Information Filtering Based on Feature Extraction. New Technology of Library and Information Service, 2008, 24(4): 29-34.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2008.04.06     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2008/V24/I4/29

[1] Wang H, Li S, Yu S, et al. A Combining Approach to Automatic Keyphrases Indexing for Chinese News Documents[C]. In: A. Gelbukh (Ed.)Computational Linguistics and Intelligent Text Processing (CICLing-2004), Lecture Notes in Computer Science,  Springer-Verlag, 2004,2945:435-438.
[2]  Li S, Wang H, Yu S, et al. News-Oriented Automatic Chinese Keyword Indexing[C]. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing,  2003: 92-97.
[3] Stevens M E. Automatic Indexing: A StateoftheArt Report[R]. Washington, D.C:Government Printing Office, 1970.
[4]  Chien L F. PATTreeBased Keyword Extraction for Chinese Information Retrieval[C]. In:Proceedings of the ACM SIGIR International Conference on Information Retrieval, 1997:50-59.
[5] Turney P D.  Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000,2(4):303-336.
[6] 王永成,顾晓明,王丽霞.中文文献主题的自动标引[J].情报学报,1998, 17(3): 212-217.
[7] 张玉叶,李连,刘海见,等.文本过滤中的特征抽取应用研究[J].海军航空工程学院学报, 2005,20(1):139-142.

[1] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[2] Dongmei Mu,Shan Jin,Yuanhong Ju. Finding Association Between Diseases and Genes from Literature Abstracts[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[3] Cong Yin,Liyi Zhang. Recommendation Algorithm for Post-Context Filtering Based on TF-IDF: Case Study of Catering O2O[J]. 数据分析与知识发现, 2018, 2(11): 28-36.
[4] Changbing Li,Chongpeng Pang,Meiping Li. Extracting Product Features with Weight-based Apriori Algorithm[J]. 数据分析与知识发现, 2017, 1(9): 83-89.
[5] Yue He,Min Xiao,Yue Zhang. Sentiment Analysis of Trending Topics Based on Relevance[J]. 数据分析与知识发现, 2017, 1(3): 46-53.
[6] Yufeng Duan,Sisi Huang. Information Extraction from Chinese Plant Species Diversity Description Text[J]. 现代图书情报技术, 2016, 32(1): 87-96.
[7] Liu Wei, Wang Xing, Song Peiyan. A Noise Cleaning Method for Synonym Extraction Results[J]. 现代图书情报技术, 2015, 31(6): 64-70.
[8] Xu Dongdong, Wu Shaobo. An Improved TF-IDF Feature Selection Based on Categorical Description[J]. 现代图书情报技术, 2015, 31(3): 39-48.
[9] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[10] Li Xiangdong, Huo Yayong, Huang Li. Study of Book Pages Automatic Identification and Bibliographic Information Extraction[J]. 现代图书情报技术, 2014, 30(4): 71-77.
[11] Liu Yajing, Wang Yanxi, Hao Dan, Zhou Jinhui. Study on the Methods of Institutional Repository Supporting Research Services[J]. 现代图书情报技术, 2014, 30(3): 1-7.
[12] Zhang Han, Liu Shuangmei. Comparative Analysis of Centrality Indices in Extracting Concepts from Semantic Predication Network——Based on Disease Treatment Research[J]. 现代图书情报技术, 2013, (6): 30-35.
[13] Lu Yonghe, Li Yanfeng. A Feature Selection Based on Consideration of Multiple Factors[J]. 现代图书情报技术, 2013, (5): 34-39.
[14] Huang Xun, You Hongliang, Yu Yang. A Review of Relation Extraction[J]. 现代图书情报技术, 2013, 29(11): 30-39.
[15] Qin Shian, Li Fayun. Improved TF-IDF Method in Text Classification[J]. 现代图书情报技术, 2013, 29(10): 27-30.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn