|
|
Stop-word Processing Technique in Knowledge Extraction |
Hua Bolin |
(Institute of Scientific and Technical Information of China,Beijing 100038,China) |
|
|
Abstract It is indispensable to index stop-word before word segmentation in knowledge extraction.The key technique of processing stop-word is how to select stop-word,acquire and organize stop-word lists,and match stop-word.To recognize stop-word,constructing stop-word list is necessary.In processing stop-word,recognizing false stop-word can decrease noise.According to experiment,processing stop-word can not only save segment time,but also improve following syntactic analysis efficiency.
|
Received: 11 May 2007
Published: 25 August 2007
|
|
Corresponding Authors:
Hua Bolin
E-mail: huabolin@istic.ac.cn
|
About author:: Hua Bolin |
[1] 周钦强,孙炳达,王义.文本自动分类系统文本预处理方法的研究[J].计算机应用研究,2005(02):85-86.
[2] 熊文新,宋柔.信息检索用户查询语句的停用词过滤[J].计算机工程,2007,33(06):195-197.
[3] 梁南元.书面汉语的自动分词与一个自动分词系统—CDWS[J].北京航空学院学报,1984(4):97-104.
[4] 罗杰,陈力,夏德麟,等.基于新的关键词提取方法的快速文本分类系统[J].计算机应用研究,2006,4:32-34.
[5] Ho T K.Stop Word Location and Identification for Adaptive Text Recognition[J].International Journal on Document Analysis and Recognition,2000,3(1):16-26.
[6] Stop Word List—Words Filtered out by Search Engine Spiders[EB/OL].[2007-06-14].http://www.seo-innovation.com/support-files/stopwordlist.pdf.
[7] 顾益军,樊孝忠,王建华,等.中文停用词表的自动选取[J].北京理工大学学报,2005,25(04):337-340.
[8] Zou F,Wang F L,Deng X T,et al.Stop Word List Construction and Application in Chinese Language Processing[J].WSEAS Transactions on Information Science and Applications,2006,3(6):1036-1044.
[9] Al Shalabi R,Kanaan G,Jaam J M,et al.Stop-word Removal Algorithm for Arabic language[C].Information and Communication Technologies:From Theory to Applications,2004.Proceedings.2004 International Conference on.
[10] Savoy J.Data Fusion for Effective European Monolingual Information Retrieval[C].Workshop of the Cross-Language Evaluation Forum(CLEF 2004),2005:233-244.
[11] Tomov D T.Research Brief:Some Critical Remarks on the Stop Word Lists of ISI Publications[J].The Journal of Documentation,2001,57(6):798-808.
[12] 孙茂松,左正平,黄昌宁.汉语自动分词词典机制的实验研究[J].中文信息学报,2000,14(1):1-6.
[13] 刘颖.用隐马尔柯夫模型对汉语进行切分和标注排歧[J].计算机工程与设计,2001,22(4):58-62.
[14] 刘开瑛.中文文本自动分词和标注[M].北京:商务印书馆,2000.
[15] Bril E.A Simple Rule-based Part-of-speech Tagger[C].In:Proceedings of the Third Conference on Applied natural Language Processing.ACL.Trento,Italy.1992:152-155. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|