Please wait a minute...
New Technology of Library and Information Service  2012, Vol. 28 Issue (3): 27-34    DOI: 10.11925/infotech.1003-3513.2012.03.05
Current Issue | Archive | Adv Search |
Contrast Analysis of Methods and Tools for Lemmatization
Wu Sizhu, Qian Qing, Hu Tiejun, Li Danya, Li Junlian, Hong Na
Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China
Download: PDF(539 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  Combining theory with practice, this paper compares the methods and tools for lemmatization in word normalization. It summarizes the categories of lemmatization methods and analyses their features and disadvantages. Then it separately compares seven tools from aspects as the principle, POS tagger, lexicon, programming language, language, spell checker.It takes experiments with the datasets from WordSimith Tools to evaluate five lemmatizers. By comparing the results, it finds that the Specialist NLP Tools has a better effect than others.This paper provides an assistance for the study in choosing the appropriate method and tool for lemmatization.
Key wordsWord normalization      Stemming      Lemmatization      Lemma     
Received: 12 January 2012      Published: 19 April 2012
: 

G350

 

Cite this article:

Wu Sizhu, Qian Qing, Hu Tiejun, Li Danya, Li Junlian, Hong Na. Contrast Analysis of Methods and Tools for Lemmatization. New Technology of Library and Information Service, 2012, 28(3): 27-34.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2012.03.05     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2012/V28/I3/27

[1] Mansuri Y, Kim J G, Compton P, et al. An Evaluation of Ripple-Down Rules[C]. In: Proceedings of the IJCAI’91 Knowledge Acquisition Workshop Pokolbin.1991: 114-132.

[2] Plisson J, Lavrac N, Mladenic D. A Rule Based Approach to Word Lemmatization[C]. In: Proceedings of the 7th International MultiConference Information Society IS. 2004:83-86.

[3] Juršic M, Mozetic I, Lavrac N.Learning Ripple Down Rules for Efficient Lemmatization[C]. In : Proceedings of the 10th International Multi-Conference Information Society IS. 2007:206-209.

[4] Chrupala G. Simple Data-Driven Context-Sensitive Lemmatization[C]. In: Proceedings of SEPLN. 2006:121-127.

[5] Daelemans W, Groenewald H J, van Huyssteen G B. Prototype-based Active Learning for Lemmatization[C]. In: Proceedings of Recent Advances in Natural Language Processing (RANLP). 2009:65-70.

[6] Plisson J, Mladenic D, Lavrac N, et.al.A Lemmatization Web Service Based on Machine Learning Techniques[C]. In: Proceedings of the 2nd Language & Technology Conference. 2005:369-372.

[7] Ingason A K, Helgadóttir S, Loftsson H, et.al. A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI)[OL]. [2011-10-22]. http://linguist.is/skjol/lemmald.pdf.

[8] Branco A, Silva J. Very High Accuracy Rule-based Nominal Lemmatization with a Minimal Lexicon [OL]. [2011-10-22]. http://quexting.di.fc.ul.pt/publicacoes/BrancoSilva2007.pdf.

[9] Kanis J, Müller L. Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization.Text[C]. In:Proceedings of the 8th International Conference on Text, Speech and Dialogue.Berlin, Heidelberg:Springer-Verlag,2005.

[10] European Languages Lemmatizer[EB/OL]. [2011-10-21].http://lemmatizer.org/.

[11] CST’s Lemmatiser[EB/OL]. [2011-10-22].http://cst.dk/online/lemmatiser/uk/.

[12] CST Lemmatiser 4.0[OL]. [2011-10-22].http://cst.dk/download/cstlemma/current/doc/cstlemma.pdf.

[13] Wmtrans Lemmatizer[EB/OL]. [2011-10-21].http://www-dev.canoo.com/wmtrans/home/index.html.

[14] MorphAdorner[EB/OL].[2011-10-21].http://morphadorner.northwestern.edu/morphadorner/.

[15] English Lemmatization Process[EB/OL].[2011-10-21]. http://morphadorner.northwestern.edu/morphadorner/lemmatizer/lemmatizationprocess/.

[16] Stanford CoreNlP[EB/OL].[2011-10-21].http://nlp.stanford.edu/software/corenlp.shtml.

[17] NLTK[EB/OL].[2011-10-21].http://www.nltk.org/.

[18] Specialist NLP Tools[EB/OL].[2011-10-21].http://specialist.nlm.nih.gov/.

[19] WordSmith[EB/OL].[2011-10-21]. http://www.lexically.net/wordsmith/.
[1] Li Xiaoying, Li Danya, Qian Qing, Sun Haixia, Li Junlian, Hu Tiejun. Research on Automatic Algorithm of Finding English Synonymous Relations for Knowledge Organization System Integration[J]. 现代图书情报技术, 2014, 30(5): 26-32.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn