|
|
Research on the Part-of-Speech Tagging Method |
Yin Jinling Wang Huilin |
(Institute of Scientific and Technical Information of China, Beijing 100038,China) |
|
|
Abstract POS tagging is an important part of corpora building and a basic research in the field of NLP. After comparing the advantage and weakness of the rule-based methods and the statistical methods, an automatic POS tagging method based on both CRF and TBL is presented. And the tests prove that the method can improve the accuracy of words tagging.
|
Received: 03 December 2008
Published: 25 March 2009
|
|
Corresponding Authors:
Yin Jinling
E-mail: permafrost@163.com
|
About author:: Yin Jinling,Wang Huilin |
[1] Daniel Jurafsky, James H. Martin.自然语言处理综述[M]. 冯志伟,孙乐译. 北京:电子工业出版社,2005.
[2] Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]. In:Proceedings of the 18th International Conf on Machine Learning. San Francisco: AAAI Press, 2001:282-289.
[3] Sutton C, McCallum A. An Introduction to Conditional Random Fields for Relational Learning[A] //Lise Getoor and Ben Taskar. Introduction to Statistical Relational Learning[M]. Maryland, MIT Press, 2006.
[4] Hanna Wallach. Efficient Training of Conditional Random Fields[C]. In: Proc.6th Annual CLUK Research Colloquium, 2002.
[5] Florian R, Ngai G. Fast Transformation-based Learning Toolkit[EB/OL]. [2008-09-10]. http://nlp.cs.jhu.edu/~rflorian/fntbl/documentation.html.
[6] Brill. Transformation-based Error-driven Learning and Natural Language Processing: A Case Study in part of Speech Tagging[J], Computational Linguistics,1995(21):543-565.
[7] 王蕾,朱巧明,李培峰,等. 基于实例和错误驱动的规则学习方法及其应用[J]. 计算机应用与软件, 2008, 25(1):162-164
[8] 张清华. 融合技术在中文名实体识别中的研究与应用[D].哈尔滨: 哈尔滨工业大学, 2004.
[9] 李鑫,黄萱菁,吴立德. 基于错误驱动算法组合分类器及其在问题分类中的应用[J]. 计算机研究与发展, 2008,45(3):535-541.
[10] 肖忠华.兰开斯特汉语语料库[EB/OL].[2008-11-05].http://ling.cass.cn/dangdai/LCMC/LCMC.htm. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|