Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (8): 37-41    DOI: 10.11925/infotech.1003-3513.2008.08.06
Current Issue | Archive | Adv Search |
Sense Disambiguation of Chinese Segmentation Based on Bi-direction Matching Method and HMM
Mai FanjinWang Ting2
1(Modern Education Technology Center, Guilin University of Technology, Guilin 541004, China)
2(Department of Electronic and Computer Science, Guilin University of Technology, Guilin 541004, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper puts forward a model which can eliminate sense ambiguity of Chinese segmentation. This model segments word based on MM and RMM at first. Then it compares the segmentation results with each other, and output a more accurate result for the segmentation. The process can be divided into three parts:discovery, extraction and disambiguation. The test result shows that this model is able to reduce the error rate of segmentation, which is caused by the ambiguity of word segmentation.

Key wordsWord segmentation      Maximum matching method      HMM      Sense disambiguation     
Received: 25 April 2008      Published: 25 August 2008
: 

TP391.1

 
Corresponding Authors: Wang Ting     E-mail: 328dickwong1981@163.com
About author:: Mai Fanjin,Wang Ting

Cite this article:

Mai Fanjin,Wang Ting. Sense Disambiguation of Chinese Segmentation Based on Bi-direction Matching Method and HMM. New Technology of Library and Information Service, 2008, 24(8): 37-41.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2008.08.06     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2008/V24/I8/37

[1] 王晓龙,关毅.计算机自然语言处理[M].北京:清华大学出版社,2005.
[2] 黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19.
[3] 刘颖.计算语言学[M].北京:清华大学出版社,2002.
[4] 梁南元.书面汉语自动分词系统——CDWS[J].中文信息学报,1987(2):44-52.
[5] 王小捷,常宝宝.自然语言处理技术基础[M].北京:北京邮电大学出版社,2002.
[6] Duda R O, Hart P E, Stork D G. Pattern Classification[M]. 2nd Edition. York:Wiley New,  2001.
[7] Jurafsky D, Martin J H. Speech and Language Processing:An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition[M].USA:Prentice Hall, 2000.
[8] Jeffrey H. Theory of Probability[M]. Oxford:Oxford University Press, 1948.
[9] Good I J. The Population Frequencies of Species and the Estimation of Population Parameters[J]. Biometrika, 1953, 40(3-4):237-264.
[10] Jelinek  F, Mercer R L. Interpolated Estimation of Markov Source Parameters from Sparse Data[C]. In:Gelsema E.S. and Kanal L.N.(eds.) Pattern Recognition in Practice, North Holland, Amsterdam, 1980:381-397.
[11] Katz  S M. Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1987, 35(3):400-401.
[12] Kneser R, Ney H. Improved Backing-off for M-Gram Language Modeling[C]. In:Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1995(1):181-184.
[13] Witten I H, Bell T C. The Zero-frequency Problem:Estimating the Probabilities of Novel Events in Adaptive Text Compression[J]. IEEE Transactions on Information Theory, 1991, 37(4):1085-1094.
[14] 郑林曦.普通话三千常用词表[M].北京:语文出版社,1987.

[1] Zhang Qi,Jiang Chuan,Ji Youshu,Feng Minxuan,Li Bin,Xu Chao,Liu Liu. Unified Model for Word Segmentation and POS Tagging of Multi-Domain Pre-Qin Literature[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[2] Xianlai Chen,Chaopeng Han,Ying An,Li Liu,Zhongmin Li,Rong Yang. Extracting New Words with Mutual Information and Logistic Regression[J]. 数据分析与知识发现, 2019, 3(8): 105-113.
[3] Feng Guoming,Zhang Xiaodong,Liu Suhui. DBLC Model for Word Segmentation Based on Autonomous Learning[J]. 数据分析与知识发现, 2018, 2(5): 40-47.
[4] Ni Weijian,Sun Haohao,Liu Tong,Zeng Qingtian. An Unsupervised Approach to Optimize Chinese Word Segmentation on Domain Literature[J]. 数据分析与知识发现, 2018, 2(2): 96-104.
[5] Wang Xiaoyu,Li Bin. Automatically Segmenting Middle Ancient Chinese Words with CRFs[J]. 数据分析与知识发现, 2017, 1(5): 62-70.
[6] Zhang Yue,Wang Dongbo,Zhu Danhao. Segmenting Chinese Words from Food Safety Emergencies[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[7] Yu Xincong, Li Honglian, Lv Xueqiang. Research on the Application of Hyponymy in the Enrollment Robot[J]. 现代图书情报技术, 2015, 31(12): 65-71.
[8] Ren Haiying, Yu Liting. A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia[J]. 现代图书情报技术, 2015, 31(11): 18-25.
[9] Zhang Jie, Zhang Haichao, Zhai Dongsheng. Research of the Word Segmentation for Chinese Patent Claims[J]. 现代图书情报技术, 2014, 30(9): 91-98.
[10] Li Wenjiang, Chen Shiqin. Application of AIMLBot Intelligent Robot in Real-time Virtual Reference Service[J]. 现代图书情报技术, 2012, 28(7): 127-132.
[11] Jiang Hua, Su Xiaoguang. Chinese High-frequency Words Extraction Algorithm Without Thesaurus[J]. 现代图书情报技术, 2012, 28(6): 50-53.
[12] Shi Chongde, Wang Huilin. Research on Chinese Word Segmentation Optimization in Statistical Machine Translation[J]. 现代图书情报技术, 2012, 28(4): 29-34.
[13] Gu Jun, Wang Hao. Study on Term Extraction on the Basis of Chinese Domain Texts[J]. 现代图书情报技术, 2011, 27(4): 29-34.
[14] Li Gang,Kou Guangzeng,Xia Chenxi,Quan Ji,Jang Donghyok. Optimal Context Window for Chinese Word Sense Disambiguation[J]. 现代图书情报技术, 2009, 25(7-8): 49-53.
[15] Xie Hui,Qin Jie,Hu Shuangshuang. The Study on the Duplicated Web Pages Detection Algorithm Based on the Keyword from User’s Submission[J]. 现代图书情报技术, 2008, 24(7): 43-46.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn