Please wait a minute...
New Technology of Library and Information Service  2009, Vol. 25 Issue (7-8): 49-53    DOI: 10.11925/infotech.1003-3513.2009.07-08.10
article Current Issue | Archive | Adv Search |
Optimal Context Window for Chinese Word Sense Disambiguation
Li GangKou GuangzengXia ChenxiQuan Ji3   Jiang Donghyok4
1 (School of Information Management, Wuhan University, Wuhan 430072, China)
2 (Beijing Science and Technology Information Institute, Beijing 100048, China)
3 (Institute of Systems Engineering, Wuhan University, Wuhan 430072, China)
4 (JengJunTaek WonSan Economic College, WonSan, North Korea)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

 To determine the optimal context field of ambiguous word, the paper uses cross-validation method to identify the optimal context window, and the best one has the lowest error rate in all of candidates. Using this method, it processes SemEval-2007 data sets and finds that the optimal context windows for this data sets is [-2, +2]. In order to verify this result, there is a WSD test for SemEval-2007 test data sets, which shows that the performance of Chinese WSD upgrades to a certain extent. And the different optimal context windows for different parts of speech of ambiguous word are discussed.

Key wordsWord sense disambiguation      Context window      Feature selection      Chinese     
Received: 04 July 2009      Published: 25 August 2009
: 

TP391

 
Corresponding Authors: Kou Guangzeng     E-mail: kouguangzeng@yahoo.com.cn
About author:: Li Gang,Kou Guangzeng,Xia Chenxi,Quan Ji,Jang Donghyok

Cite this article:

Li Gang,Kou Guangzeng,Xia Chenxi,Quan Ji,Jang Donghyok. Optimal Context Window for Chinese Word Sense Disambiguation. New Technology of Library and Information Service, 2009, 25(7-8): 49-53.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2009.07-08.10     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2009/V25/I7-8/49

[1] Nancy Ide, Jean Véronis. Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art [J].Computational Linguistics, 1998, 24(1): 2-40.
[2] Mosteller F, Wallace D L. Inference and Disputed Authorship: The Federalist Papers[M]. USA: Addison-Wesley Educational Publishers Inc, 1964.
[3] Martin W J R, Al B  P F, Van Sterkenburg  P J G. On the Processing of Text Corpus: From Textual Data to Lexicographical Information [A]. //Lexicography: Principles and Practice [M]. USA: Academic Press, 1983: 56-64.
[4] Choueka Y, Lusignan S. Disambiguation by Short Contexts [J].Computers and the Humanities, 1985, 19(3):147-157.
[5] Gale W A, Church K W, Yarowsky D. A Method for Disambiguating Word Senses in a Large Corpus [J].Computers and the Humanities, 1992, 26(5-6): 415-439.
[6] Yarowsky D. One Sense per Collocation [C].In:Proceedings of the Workshop on Human Language Technology, Princeton, New Jersey. USA: Association for Computational Linguistics, 1993: 266-271.
[7] Hughes J. Automatically Acquiring a Classification of Words [D]. Paris: University of Leeds, 1994.
[8] 朱靖波, 李珩, 张跃, 等. 基于对数模型的词义自动消歧 [J].软件学报, 2001, 12(9): 1405-1412.
[9] 卢志茂, 刘挺, 郎君, 等. 神经网络和贝叶斯网络在汉语词义消歧上的对比研究[J].高技术通讯,2004,14(8): 15-19.
[10] 吴云芳, 王淼, 金澎, 等. 多分类器集成的汉语词义消歧研究[J].计算机研究与发展, 2008, 45(8):1354-1361.
[11] 陈佳,罗振声. 一种基于语义搭配的汉语词义消歧方法[J].微计算机信息, 2008,24(3):186-188.
[12] 谢宇,张仰森,肖建涛. 规则与统计相结合的汉语词义消歧模型[J].北京机械工业学院学报:综合版, 2007,22(3): 5-9.
[13] 朱姝,张政. 基于多层次句子相似度与向量空间模型的词义消歧[J].北京工商大学学报:自然科学版, 2009, 27(2):68-72.
[14] 鲁松, 白硕. 自然语言处理中词语上下文有效范围的定量描述[J].计算机学报, 2001, 24(7): 742-747.
[15] Jin P, Wu Y, Yu S. SemEval-2007 Task 05: Multilingual Chinese-English Lexical Sample Task [C]. In:Proceedings of the 4th International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. USA: Association for Computational Linguistics, 2007:19-23.
[16] Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques [M]. 2nd Edition. USA: Morgan Kaufmann, 2005.
[17] Pedersen T. A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation [C].In:Proceedings of the 1st Conference on North American Chapter of the Association for Computational Linguistics. USA: Morgan Kaufmann, 2000: 63-69.
[18] John G H,  Langley P. Estimating Continuous Distributions in Bayesian Classifiers [C]. In:Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. USA: Morgan Kaufmann, 1995:338-345.

[1] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[2] Yin Pengbo,Pan Weimin,Zhang Haijun,Chen Degang. Identifying Clickbait with BERT-BiGA Model[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[3] Lin Kerou,Wang Hao,Gong Lijuan,Zhang Baolong. Disambiguation of Chinese Author Names with Multiple Features[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[4] Wang Qian,Wang Dongbo,Li Bin,Xu Chao. Deep Learning Based Automatic Sentence Segmentation and Punctuation Model for Massive Classical Chinese Literature[J]. 数据分析与知识发现, 2021, 5(3): 25-34.
[5] Liang Jiaming, Zhao Jie, Zheng Peng, Huang Liushen, Ye Minqi, Dong Zhenning. Framework for Computing Trust in Online Short-Rent Platform Using Feature Selection of Images and Texts[J]. 数据分析与知识发现, 2021, 5(2): 129-140.
[6] Ji Youshu, Wang Dongbo, Huang Shuiqing. Automatically Extracting Ancient Chinese Synonyms with Word Alignment——Case Study of Pre-Four-History Corpus[J]. 数据分析与知识发现, 2021, 5(11): 135-144.
[7] Wang Yan, Wang Huyan, Yu Bengong. Chinese Text Classification with Feature Fusion[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[8] Liang Jiwen,Jiang Chuan,Wang Dongbo. Chinese-English Sentence Alignment of Ancient Literature Based on Multi-feature Fusion[J]. 数据分析与知识发现, 2020, 4(9): 123-132.
[9] Wei Guohui,Zhang Fengcong,Fu Xianjun,Wang Zhenguo. Similarity Measurement of Traditional Chinese Medicine Components for Cold-hot Nature Discrimination[J]. 数据分析与知识发现, 2020, 4(5): 75-83.
[10] Zhang Runtong,Chen Donghua,Zhao Hongmei,Zhu Xiaomin. Computer-Assisted ICD-11 Coding Method Based on Chinese Semantic Analysis[J]. 数据分析与知识发现, 2020, 4(4): 44-55.
[11] Tang Lin,Guo Chonghui,Chen Jingfeng. Review of Chinese Word Segmentation Studies[J]. 数据分析与知识发现, 2020, 4(2/3): 1-17.
[12] Liu Jingru,Song Yang,Jia Rui,Zhang Yipeng,Luo Yong,Ma Jingdong. A BiLSTM-CRF Model for Protected Health Information in Chinese[J]. 数据分析与知识发现, 2020, 4(10): 124-133.
[13] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[14] Zhongxi You,Weina Hua,Xuelian Pan. Matching Book Reviews and Essential Sentiment Lexicons with Chinese Word Segmenters[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[15] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn