Please wait a minute...
New Technology of Library and Information Service  2007, Vol. 2 Issue (8): 48-51    DOI: 10.11925/infotech.1003-3513.2007.08.11
Current Issue | Archive | Adv Search |
Stop-word Processing Technique in Knowledge Extraction
Hua Bolin
 (Institute of Scientific and Technical Information of China,Beijing 100038,China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

It is indispensable to index stop-word before word segmentation in knowledge extraction.The key technique of processing stop-word is how to select stop-word,acquire and organize stop-word lists,and match stop-word.To recognize stop-word,constructing stop-word list is necessary.In processing stop-word,recognizing false stop-word can decrease noise.According to experiment,processing stop-word can not only save segment time,but also improve following syntactic analysis efficiency.

Key wordsKnowledge extraction      Stop-word      Chinese segmentation      Natural language processing      Text information analysis     
Received: 11 May 2007      Published: 25 August 2007
: 

TP391 

 
     
  G356

 
Corresponding Authors: Hua Bolin     E-mail: huabolin@istic.ac.cn
About author:: Hua Bolin

Cite this article:

Hua Bolin. Stop-word Processing Technique in Knowledge Extraction. New Technology of Library and Information Service, 2007, 2(8): 48-51.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2007.08.11     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2007/V2/I8/48

[1] 周钦强,孙炳达,王义.文本自动分类系统文本预处理方法的研究[J].计算机应用研究,2005(02):85-86.
[2] 熊文新,宋柔.信息检索用户查询语句的停用词过滤[J].计算机工程,2007,33(06):195-197.
[3] 梁南元.书面汉语的自动分词与一个自动分词系统—CDWS[J].北京航空学院学报,1984(4):97-104.
[4] 罗杰,陈力,夏德麟,等.基于新的关键词提取方法的快速文本分类系统[J].计算机应用研究,2006,4:32-34.
[5] Ho T K.Stop Word Location and Identification for Adaptive Text Recognition[J].International Journal on Document Analysis and Recognition,2000,3(1):16-26.
[6] Stop Word List—Words Filtered out by Search Engine Spiders[EB/OL].[2007-06-14].http://www.seo-innovation.com/support-files/stopwordlist.pdf.
[7] 顾益军,樊孝忠,王建华,等.中文停用词表的自动选取[J].北京理工大学学报,2005,25(04):337-340.
[8] Zou F,Wang F L,Deng X T,et al.Stop Word List Construction and Application in Chinese Language Processing[J].WSEAS Transactions on Information Science and Applications,2006,3(6):1036-1044.
[9] Al Shalabi R,Kanaan G,Jaam J M,et al.Stop-word Removal Algorithm for Arabic language[C].Information and Communication Technologies:From Theory to Applications,2004.Proceedings.2004 International Conference on.
[10] Savoy J.Data Fusion for Effective European Monolingual Information Retrieval[C].Workshop of the Cross-Language Evaluation Forum(CLEF 2004),2005:233-244.
[11] Tomov D T.Research Brief:Some Critical Remarks on the Stop Word Lists of ISI Publications[J].The Journal of Documentation,2001,57(6):798-808.
[12] 孙茂松,左正平,黄昌宁.汉语自动分词词典机制的实验研究[J].中文信息学报,2000,14(1):1-6.
[13] 刘颖.用隐马尔柯夫模型对汉语进行切分和标注排歧[J].计算机工程与设计,2001,22(4):58-62.
[14] 刘开瑛.中文文本自动分词和标注[M].北京:商务印书馆,2000.
[15] Bril E.A Simple Rule-based Part-of-speech Tagger[C].In:Proceedings of the Third Conference on Applied natural Language Processing.ACL.Trento,Italy.1992:152-155.

[1] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[2] Shi Xiang,Liu Ping. Extraction and Representation of Domain Knowledge with Semantic Description Model and Knowledge Elements——Case Study of Information Retrieval[J]. 数据分析与知识发现, 2021, 5(4): 123-133.
[3] Mingxuan Huang,Shoudong Lu,Hui Xu. Cross-Language Information Retrieval Based on Weighted Association Patterns and Rule Consequent Expansion[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
[4] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[5] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[6] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[7] Yang Chunlei. Quantification Constraint System for Pragmatic Disambiguation: From Linguistic Design to Computational Implementation[J]. 数据分析与知识发现, 2017, 1(11): 1-11.
[8] Liu Jianhua,Wang Ying,Zhang Zhixiong,Li Chuanxi. Extracting Semantic Knowledge from Plant Species Diversity Collections[J]. 数据分析与知识发现, 2017, 1(1): 37-46.
[9] Yang Chunlei. Building Online System for Chinese Lexicon and Grammar[J]. 现代图书情报技术, 2016, 32(7-8): 129-136.
[10] Liu Tianyi,Bu Yi Zhao Danqun Huang Wenbin,Zhao Danqun,Huang Wenbin. Review of Citation-based Automatic Summarization Studies[J]. 现代图书情报技术, 2016, 32(5): 1-8.
[11] Peng Hao, Xu Jian, Xiao Zhuo. Sentiment Analysis of Web Reviews Based on Comparative Sentence Extraction[J]. 现代图书情报技术, 2015, 31(12): 48-56.
[12] Yang Chunlei, Dan Flickinger. ManGO:Grammar Engineering for Deep Linguistic Processing[J]. 现代图书情报技术, 2014, 30(3): 57-64.
[13] Qiu Junping, Fang Guoping. The Comparative Analysis of Natural Language Processing Research at Home and Abroad Based on Knowledge Mapping[J]. 现代图书情报技术, 2014, 30(12): 51-61.
[14] She Guiqing, Zhang Yongan. Study on the Model of Automatic Extraction and Annotation of Trail Cases[J]. 现代图书情报技术, 2013, (6): 23-29.
[15] Hua Bolin. Extracting Information Method Term from Chinese Academic Literature[J]. 现代图书情报技术, 2013, (6): 68-75.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn