Please wait a minute...
New Technology of Library and Information Service  2009, Vol. 3 Issue (3): 46-51    DOI: 10.11925/infotech.1003-3513.2009.03.08
Current Issue | Archive | Adv Search |
Research on the Part-of-Speech Tagging Method
Yin Jinling  Wang Huilin
(Institute of Scientific and Technical Information of China, Beijing 100038,China)
Export: BibTeX | EndNote (RIS)      

POS tagging is an important part of corpora building and a basic research in the field of NLP. After comparing the advantage and weakness of the rule-based methods and the statistical methods, an automatic POS tagging method based on both CRF and TBL is presented. And the tests prove that the method can improve the accuracy of words tagging.

Key wordsPOS tagging      CRF      TBL      Error-driven     
Received: 03 December 2008      Published: 25 March 2009


Corresponding Authors: Yin Jinling     E-mail:
About author:: Yin Jinling,Wang Huilin

Cite this article:

Yin Jinling,Wang Huilin. Research on the Part-of-Speech Tagging Method. New Technology of Library and Information Service, 2009, 3(3): 46-51.

URL:     OR

[1] Daniel Jurafsky, James H. Martin.自然语言处理综述[M]. 冯志伟,孙乐译. 北京:电子工业出版社,2005.
[2] Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]. In:Proceedings of the 18th International Conf on Machine Learning. San Francisco: AAAI Press, 2001:282-289.
[3] Sutton C, McCallum A. An Introduction to Conditional Random Fields for Relational   Learning[A] //Lise Getoor and Ben Taskar. Introduction to Statistical Relational Learning[M]. Maryland, MIT Press, 2006.
[4] Hanna Wallach. Efficient Training of Conditional Random Fields[C]. In: Proc.6th Annual CLUK Research Colloquium, 2002.
[5] Florian R, Ngai G. Fast Transformation-based Learning Toolkit[EB/OL]. [2008-09-10].
[6] Brill. Transformation-based Error-driven Learning and Natural Language Processing: A Case Study in part of Speech Tagging[J], Computational Linguistics,1995(21):543-565.
[7] 王蕾,朱巧明,李培峰,等. 基于实例和错误驱动的规则学习方法及其应用[J]. 计算机应用与软件, 2008, 25(1):162-164
[8] 张清华. 融合技术在中文名实体识别中的研究与应用[D].哈尔滨: 哈尔滨工业大学, 2004.
[9] 李鑫,黄萱菁,吴立德. 基于错误驱动算法组合分类器及其在问题分类中的应用[J]. 计算机研究与发展, 2008,45(3):535-541.
[10] 肖忠华.兰开斯特汉语语料库[EB/OL].[2008-11-05].

[1] Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[2] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[3] Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[4] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[5] Ma Jianxia,Yuan Hui,Jiang Xiang. Extracting Name Entities from Ecological Restoration Literature with Bi-LSTM+CRF[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[6] Na Ma,Zhixiong Zhang,Pengmin Wu. Automatic Identification of Term Citation Object with Feature Fusion[J]. 数据分析与知识发现, 2020, 4(1): 89-98.
[7] Xiaoxiao Zhu,Zunqi Yang,Jing Liu. Construction of an Adverse Drug Reaction Extraction Model Based on Bi-LSTM and CRF[J]. 数据分析与知识发现, 2019, 3(2): 90-97.
[8] Li Yu,Li Qian,Changlei Fu,Huaming Zhao. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[9] Feng Guoming,Zhang Xiaodong,Liu Suhui. DBLC Model for Word Segmentation Based on Autonomous Learning[J]. 数据分析与知识发现, 2018, 2(5): 40-47.
[10] Qi Huiying,Guo Jianguang. Integrating Multi-Source Clinical Research Data Based on CDISC Standard[J]. 数据分析与知识发现, 2018, 2(5): 88-93.
[11] Wang Miping,Wang Hao,Deng Sanhong,Wu Zhixiang. Extracting Chinese Metallurgy Patent Terms with Conditional Random Fields[J]. 现代图书情报技术, 2016, 32(6): 28-36.
[12] Sui Mingshuang,Cui Lei. Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[13] Duan Yufeng, Zhu Wenjing, Chen Qiao, Liu Wei, Liu Fenghong. The Study on Out-of-Vocabulary Identification on a Model Based on the Combination of CRFs and Domain Ontology Elements Set[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[14] Shi Cui, Wang Yang, Yang Bin, Yao Ye. Identification of Non-nest Coordination for Chinese Patent Literature[J]. 现代图书情报技术, 2014, 30(10): 76-83.
[15] Wang Run,He Lin,Wang Dongbo,Huang Shuiqing,Fan Yuanbiao. Research on Plant Growth and Development Stage Named Entity Recognition for Text Mining[J]. 现代图书情报技术, 2014, 30(1): 24-27.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938