|
|
Nested Vector Segmentation Technique in Knowledge Extraction |
Hua Bolin Zhao Liang |
(Institute of Scientific and Technical Information of China, Beijing 100038, China) |
|
|
Abstract Well-known algorithm of maximum matching method is implemented in the process of knowledge extraction, and drawn a conclusion about critical techniques of vector segmentation. Nested vector segmentation is designed and implemented on account of disadvantage of once scanning. According to experiment, nested vector segmentation is used in knowledge extraction, it not only improves precision and recall, which resolves the problem of word in word radically, but also provides convenience to following syntactic analysis.
|
Received: 11 May 2007
Published: 25 July 2007
|
|
Corresponding Authors:
Hua Bolin
E-mail: huabolin@istic.ac.cn
|
About author:: Hua Bolin,Zhao Liang |
1] 梁南元.书面汉语的自动分词与一个自动分词系统—CDWS[J].北京航空学院学报,1984,(4):97-104.
[2] 揭春雨,刘源,梁南元.论汉语自动分词方法[J].中文信息学报,1989,3(1):1-9.
[3] 关英春,秦蓓.汉语文字自动统计系统[J].中文信息学报,1986,(1):26-32.
[4] 揭春雨,刘源,梁南元.汉语自动分词实用系统CASS的设计和实现[J].中文信息学报,1991,5(4):27-34.
[5] 骆正清,陈增武,胡上序.一种改进的MM分词方法的算法设计[J].中文信息学报,1996,10(3):30-37.
[6] 王兰成.基于EMM中文抽词算法的XMARC主题信息挖掘[J].情报学报,2005,24(1):82-86.
[7] 赵元正,戴尔晗.基于递归式最大匹配法的数据库查询接口的实现[J].计算机时代,2006(12):38-40.
[8] 苏芳仲,林世平.Web文本挖掘中的一种中文分词算法研究及其实现[J].福州大学学报(自然科学版),2004,32(增刊):67-71.
[9] 路永刚,赵伟.一种改进的MM分词方法的研究与实现[J].长春工业大学学报(自然科学版),2006,27(4):320-323.
[10] 郑逢斌,付征叶,乔保军,等.HENU汉语自动分词系统中歧义字段消除算法[J].河南大学学报(自然科学版),2004,34(4):49-52.
[11] 马玉春,宋瀚涛.Web 中文文本分词技术研究[J].计算机应用,2004,24(4):134-136. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|