Please wait a minute...
New Technology of Library and Information Service  2009, Vol. 25 Issue (7-8): 49-53    DOI: 10.11925/infotech.1003-3513.2009.07-08.10
article Current Issue | Archive | Adv Search |
Optimal Context Window for Chinese Word Sense Disambiguation
Li GangKou GuangzengXia ChenxiQuan Ji3   Jiang Donghyok4
1 (School of Information Management, Wuhan University, Wuhan 430072, China)
2 (Beijing Science and Technology Information Institute, Beijing 100048, China)
3 (Institute of Systems Engineering, Wuhan University, Wuhan 430072, China)
4 (JengJunTaek WonSan Economic College, WonSan, North Korea)
Download: PDF(772 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

 To determine the optimal context field of ambiguous word, the paper uses cross-validation method to identify the optimal context window, and the best one has the lowest error rate in all of candidates. Using this method, it processes SemEval-2007 data sets and finds that the optimal context windows for this data sets is [-2, +2]. In order to verify this result, there is a WSD test for SemEval-2007 test data sets, which shows that the performance of Chinese WSD upgrades to a certain extent. And the different optimal context windows for different parts of speech of ambiguous word are discussed.

Key wordsWord sense disambiguation      Context window      Feature selection      Chinese     
Received: 04 July 2009      Published: 25 August 2009
: 

TP391

 
Corresponding Authors: Kou Guangzeng     E-mail: kouguangzeng@yahoo.com.cn
About author:: Li Gang,Kou Guangzeng,Xia Chenxi,Quan Ji,Jang Donghyok

Cite this article:

Li Gang,Kou Guangzeng,Xia Chenxi,Quan Ji,Jang Donghyok. Optimal Context Window for Chinese Word Sense Disambiguation. New Technology of Library and Information Service, 2009, 25(7-8): 49-53.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2009.07-08.10     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2009/V25/I7-8/49

[1] Nancy Ide, Jean Véronis. Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art [J].Computational Linguistics, 1998, 24(1): 2-40.
[2] Mosteller F, Wallace D L. Inference and Disputed Authorship: The Federalist Papers[M]. USA: Addison-Wesley Educational Publishers Inc, 1964.
[3] Martin W J R, Al B  P F, Van Sterkenburg  P J G. On the Processing of Text Corpus: From Textual Data to Lexicographical Information [A]. //Lexicography: Principles and Practice [M]. USA: Academic Press, 1983: 56-64.
[4] Choueka Y, Lusignan S. Disambiguation by Short Contexts [J].Computers and the Humanities, 1985, 19(3):147-157.
[5] Gale W A, Church K W, Yarowsky D. A Method for Disambiguating Word Senses in a Large Corpus [J].Computers and the Humanities, 1992, 26(5-6): 415-439.
[6] Yarowsky D. One Sense per Collocation [C].In:Proceedings of the Workshop on Human Language Technology, Princeton, New Jersey. USA: Association for Computational Linguistics, 1993: 266-271.
[7] Hughes J. Automatically Acquiring a Classification of Words [D]. Paris: University of Leeds, 1994.
[8] 朱靖波, 李珩, 张跃, 等. 基于对数模型的词义自动消歧 [J].软件学报, 2001, 12(9): 1405-1412.
[9] 卢志茂, 刘挺, 郎君, 等. 神经网络和贝叶斯网络在汉语词义消歧上的对比研究[J].高技术通讯,2004,14(8): 15-19.
[10] 吴云芳, 王淼, 金澎, 等. 多分类器集成的汉语词义消歧研究[J].计算机研究与发展, 2008, 45(8):1354-1361.
[11] 陈佳,罗振声. 一种基于语义搭配的汉语词义消歧方法[J].微计算机信息, 2008,24(3):186-188.
[12] 谢宇,张仰森,肖建涛. 规则与统计相结合的汉语词义消歧模型[J].北京机械工业学院学报:综合版, 2007,22(3): 5-9.
[13] 朱姝,张政. 基于多层次句子相似度与向量空间模型的词义消歧[J].北京工商大学学报:自然科学版, 2009, 27(2):68-72.
[14] 鲁松, 白硕. 自然语言处理中词语上下文有效范围的定量描述[J].计算机学报, 2001, 24(7): 742-747.
[15] Jin P, Wu Y, Yu S. SemEval-2007 Task 05: Multilingual Chinese-English Lexical Sample Task [C]. In:Proceedings of the 4th International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. USA: Association for Computational Linguistics, 2007:19-23.
[16] Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques [M]. 2nd Edition. USA: Morgan Kaufmann, 2005.
[17] Pedersen T. A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation [C].In:Proceedings of the 1st Conference on North American Chapter of the Association for Computational Linguistics. USA: Morgan Kaufmann, 2000: 63-69.
[18] John G H,  Langley P. Estimating Continuous Distributions in Bayesian Classifiers [C]. In:Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. USA: Morgan Kaufmann, 1995:338-345.

[1] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[2] Jiaming Liang,Jie Zhao,Zhou Jianlong,Zhenning Dong. Detecting Collusive Fraudulent Online Transaction with Implicit User Behaviors[J]. 数据分析与知识发现, 2019, 3(5): 125-138.
[3] Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[4] Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li. The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books[J]. 数据分析与知识发现, 2019, 3(3): 57-65.
[5] Zhanglu Tan,Zhaogang Wang,Han Hu. Study on a Method of Feature Classification Selection Based on χ2 Statistics[J]. 数据分析与知识发现, 2019, 3(2): 72-78.
[6] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[7] Xueying Wang,Hao Wang,Zixuan Zhang. Recognizing Semantics of Continuous Strings in Chinese Patent Documents[J]. 数据分析与知识发现, 2018, 2(5): 11-22.
[8] Guoming Feng,Xiaodong Zhang,Suhui Liu. DBLC Model for Word Segmentation Based on Autonomous Learning[J]. 数据分析与知识发现, 2018, 2(5): 40-47.
[9] Tingxin Wen,Yangzi Li,Jingshuang Sun. Extracting Text Features with Improved Fruit Fly Optimization Algorithm[J]. 数据分析与知识发现, 2018, 2(5): 59-69.
[10] Weijian Ni,Haohao Sun,Tong Liu,Qingtian Zeng. An Unsupervised Approach to Optimize Chinese Word Segmentation on Domain Literature[J]. 数据分析与知识发现, 2018, 2(2): 96-104.
[11] Zhipeng Li,Weizhong Li. Feature Selection Based on Modified QPSO Algorithm[J]. 数据分析与知识发现, 2017, 1(7): 82-89.
[12] Xiaoyu Wang,Bin Li. Automatically Segmenting Middle Ancient Chinese Words with CRFs[J]. 数据分析与知识发现, 2017, 1(5): 62-70.
[13] Ruihua Qi. Identifying Chinese Microblog Author Gender Based on Dependency[J]. 数据分析与知识发现, 2017, 1(2): 58-63.
[14] Yue Zhang,Dongbo Wang,Danhao Zhu. Segmenting Chinese Words from Food Safety Emergencies[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[15] Dongsheng Zhai,Wenhao Cai,Jie Zhang,Zhenfei Li. An Improved Method of Semantic Similarity Calculation of Chinese Trademarks[J]. 数据分析与知识发现, 2017, 1(11): 19-28.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn