|
|
Sense Disambiguation of Chinese Segmentation Based on Bi-direction Matching Method and HMM |
Mai Fanjin1 Wang Ting2 |
1(Modern Education Technology Center, Guilin University of Technology, Guilin 541004, China)
2(Department of Electronic and Computer Science, Guilin University of Technology, Guilin 541004, China) |
|
|
Abstract This paper puts forward a model which can eliminate sense ambiguity of Chinese segmentation. This model segments word based on MM and RMM at first. Then it compares the segmentation results with each other, and output a more accurate result for the segmentation. The process can be divided into three parts:discovery, extraction and disambiguation. The test result shows that this model is able to reduce the error rate of segmentation, which is caused by the ambiguity of word segmentation.
|
Received: 25 April 2008
Published: 25 August 2008
|
|
Corresponding Authors:
Wang Ting
E-mail: 328dickwong1981@163.com
|
About author:: Mai Fanjin,Wang Ting |
[1] 王晓龙,关毅.计算机自然语言处理[M].北京:清华大学出版社,2005.
[2] 黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19.
[3] 刘颖.计算语言学[M].北京:清华大学出版社,2002.
[4] 梁南元.书面汉语自动分词系统——CDWS[J].中文信息学报,1987(2):44-52.
[5] 王小捷,常宝宝.自然语言处理技术基础[M].北京:北京邮电大学出版社,2002.
[6] Duda R O, Hart P E, Stork D G. Pattern Classification[M]. 2nd Edition. York:Wiley New, 2001.
[7] Jurafsky D, Martin J H. Speech and Language Processing:An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition[M].USA:Prentice Hall, 2000.
[8] Jeffrey H. Theory of Probability[M]. Oxford:Oxford University Press, 1948.
[9] Good I J. The Population Frequencies of Species and the Estimation of Population Parameters[J]. Biometrika, 1953, 40(3-4):237-264.
[10] Jelinek F, Mercer R L. Interpolated Estimation of Markov Source Parameters from Sparse Data[C]. In:Gelsema E.S. and Kanal L.N.(eds.) Pattern Recognition in Practice, North Holland, Amsterdam, 1980:381-397.
[11] Katz S M. Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1987, 35(3):400-401.
[12] Kneser R, Ney H. Improved Backing-off for M-Gram Language Modeling[C]. In:Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1995(1):181-184.
[13] Witten I H, Bell T C. The Zero-frequency Problem:Estimating the Probabilities of Novel Events in Adaptive Text Compression[J]. IEEE Transactions on Information Theory, 1991, 37(4):1085-1094.
[14] 郑林曦.普通话三千常用词表[M].北京:语文出版社,1987. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|