Please wait a minute...
New Technology of Library and Information Service  2006, Vol. 1 Issue (5): 13-17    DOI: 10.11925/infotech.1003-3513.2006.05.04
Current Issue | Archive | Adv Search |
Study of Self-adaptive Matching Method in Chinese Segmentation Based on Decided Vocabulary
Huang Shuiqing  Cheng Chong
( College of Information Science and Technology,Nanjing  Agricultural University,Nanjing  210095,China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper presents an algorithm of self-adaptive matching method in Chinese segmentation. This algorithm not only identifies Chinese words in vocabulary successfully but also identifies  unlisted words which are not in vocabulary on basis of decided vocabulary automatically. The test which compares this algorithm with Reverse Maximum Matching Method and some methods which identify unlisted words proves that it can resolve unknown words segmentation effectively, decreases mistakes of Chinese segmentation and has no effect on the efficiency of Chinese segmentation largely.

Key wordsAutomatic segmentation      New word identification      Unlisted words     
Received: 01 December 2005      Published: 25 May 2006
: 

TP391

 
Corresponding Authors: Huang Shuiqing     E-mail: sqhuang@njau.edu.cn
About author:: Huang Shuiqing,Cheng Chong

Cite this article:

Huang Shuiqing,Cheng Chong . Study of Self-adaptive Matching Method in Chinese Segmentation Based on Decided Vocabulary. New Technology of Library and Information Service, 2006, 1(5): 13-17.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2006.05.04     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2006/V1/I5/13

1孙茂松,邹嘉彦.汉语自动分词研究中的若干理论问题.语言文字应用,1995(4):40-46
2孙茂松,邹嘉彦.汉语自动分词研究评述.当代语言学,2001(1):22-32
3何燕.任意类型的未登录词的识别研究.[学位论文].北京:北京语言文化大学文化学院,2000
4宋柔,朱宏,潘维桂等.基于语料库和规则库的人名识别法.见:陈力为编.计算语言学研究与应用.北京:北京语言学院出版社,1993150-154
5陈小荷.自动分词中未登录词问题的一揽子解决方案.语言文字应用,1999(3):103-109
6张普,张尧汉.现代汉语“有穷多层列举”自动分词方法的讨论.语言与计算机,1986(3):61-64
7马光志,李专.基于特征词的自动分词研究.华中科技大学学报(自然科学版),2003(3):60-628
8苏菲,王丹力,戴国忠.基于标记的规则统计模型与未登录词识别算法.计算机工程与应用,2004(15):43-45,91
9秦文,苑春法.基于决策树的汉语未登录词识别.中文信息学报,2004,18(1):14-19
10吕雅娟等.基于分解与动态规划策略的汉语未登录词识别.中文信息学报,2001,15(1):28-33
11岳涛.汉语自动分词技术的最新发展及其在信息检索中的应用.情报杂志,2005(4):55-57,60
12朱德熙.语法讲义.北京:商务印书馆,1982
13刘源,谭强,沈旭昆.信息处理用现代汉语分词规范及自动分词方法.北京:清华大学出版社,1994
14张春霞,郝天永.汉语分词的研究现状与因难.系统仿真学报,2005,17(1):138-143,147
15秦浩伟,步丰林.一个中文新词识别特征的研究.计算机工程,2004,30(增刊):369-370,414
16中国科学院计算技术研究所.中文自然语言处理开放平台.http://www.nlp.org.cn/project/project.php?proj_id=6(Accessed Feb.2, 2005)

[1] Wen Tingxiao,Qiu Junping,Hou Jingchuan. View of Chinese Automatic Segmentation Research Wen Tingxiao  Qiu Junping  Hou Jingchuan[J]. 现代图书情报技术, 2004, 20(7): 6-10.
[2] Huang Kun,Fu Shaohong. Some Related Problems Faced by the Application of It in Information Retrieval[J]. 现代图书情报技术, 2001, 17(3): 26-29.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn