|
|
Optimal Context Window for Chinese Word Sense Disambiguation |
Li Gang1 Kou Guangzeng1 Xia Chenxi2 Quan Ji3 Jiang Donghyok4 |
1 (School of Information Management, Wuhan University, Wuhan 430072, China)
2 (Beijing Science and Technology Information Institute, Beijing 100048, China)
3 (Institute of Systems Engineering, Wuhan University, Wuhan 430072, China)
4 (JengJunTaek WonSan Economic College, WonSan, North Korea) |
|
|
Abstract To determine the optimal context field of ambiguous word, the paper uses cross-validation method to identify the optimal context window, and the best one has the lowest error rate in all of candidates. Using this method, it processes SemEval-2007 data sets and finds that the optimal context windows for this data sets is [-2, +2]. In order to verify this result, there is a WSD test for SemEval-2007 test data sets, which shows that the performance of Chinese WSD upgrades to a certain extent. And the different optimal context windows for different parts of speech of ambiguous word are discussed.
|
Received: 04 July 2009
Published: 25 August 2009
|
|
Corresponding Authors:
Kou Guangzeng
E-mail: kouguangzeng@yahoo.com.cn
|
About author:: Li Gang,Kou Guangzeng,Xia Chenxi,Quan Ji,Jang Donghyok |
[1] Nancy Ide, Jean Véronis. Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art [J].Computational Linguistics, 1998, 24(1): 2-40.
[2] Mosteller F, Wallace D L. Inference and Disputed Authorship: The Federalist Papers[M]. USA: Addison-Wesley Educational Publishers Inc, 1964.
[3] Martin W J R, Al B P F, Van Sterkenburg P J G. On the Processing of Text Corpus: From Textual Data to Lexicographical Information [A]. //Lexicography: Principles and Practice [M]. USA: Academic Press, 1983: 56-64.
[4] Choueka Y, Lusignan S. Disambiguation by Short Contexts [J].Computers and the Humanities, 1985, 19(3):147-157.
[5] Gale W A, Church K W, Yarowsky D. A Method for Disambiguating Word Senses in a Large Corpus [J].Computers and the Humanities, 1992, 26(5-6): 415-439.
[6] Yarowsky D. One Sense per Collocation [C].In:Proceedings of the Workshop on Human Language Technology, Princeton, New Jersey. USA: Association for Computational Linguistics, 1993: 266-271.
[7] Hughes J. Automatically Acquiring a Classification of Words [D]. Paris: University of Leeds, 1994.
[8] 朱靖波, 李珩, 张跃, 等. 基于对数模型的词义自动消歧 [J].软件学报, 2001, 12(9): 1405-1412.
[9] 卢志茂, 刘挺, 郎君, 等. 神经网络和贝叶斯网络在汉语词义消歧上的对比研究[J].高技术通讯,2004,14(8): 15-19.
[10] 吴云芳, 王淼, 金澎, 等. 多分类器集成的汉语词义消歧研究[J].计算机研究与发展, 2008, 45(8):1354-1361.
[11] 陈佳,罗振声. 一种基于语义搭配的汉语词义消歧方法[J].微计算机信息, 2008,24(3):186-188.
[12] 谢宇,张仰森,肖建涛. 规则与统计相结合的汉语词义消歧模型[J].北京机械工业学院学报:综合版, 2007,22(3): 5-9.
[13] 朱姝,张政. 基于多层次句子相似度与向量空间模型的词义消歧[J].北京工商大学学报:自然科学版, 2009, 27(2):68-72.
[14] 鲁松, 白硕. 自然语言处理中词语上下文有效范围的定量描述[J].计算机学报, 2001, 24(7): 742-747.
[15] Jin P, Wu Y, Yu S. SemEval-2007 Task 05: Multilingual Chinese-English Lexical Sample Task [C]. In:Proceedings of the 4th International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. USA: Association for Computational Linguistics, 2007:19-23.
[16] Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques [M]. 2nd Edition. USA: Morgan Kaufmann, 2005.
[17] Pedersen T. A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation [C].In:Proceedings of the 1st Conference on North American Chapter of the Association for Computational Linguistics. USA: Morgan Kaufmann, 2000: 63-69.
[18] John G H, Langley P. Estimating Continuous Distributions in Bayesian Classifiers [C]. In:Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. USA: Morgan Kaufmann, 1995:338-345. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|