|
|
Comparative Study on HMM and CRFs Applying in Information Extraction |
Wang Hao Deng Sanhong |
(Department of Information Management, Nanjing University,Nanjing 210093,China) |
|
|
Abstract This paper brings forward two models for person-name entity extraction based on the comparison of math theory between HMM and CRFs, one using word role label based HMM and the other using character role label based CRFs, then validates and compares the effect of both by open-testing and applying in practice, and thereby proves in practice that CRFs is fitter for sequence labeling and object classifying than HMM.
|
Received: 11 October 2007
Published: 25 December 2007
|
|
Corresponding Authors:
Wang Hao
E-mail: ywhaowang810710@sina.com
|
About author:: Wang Hao,Deng Sanhong |
[1] 傅爱平. 计算语言学和自然语言信息处理研究和应用综述[EB/OL].[2007-10-01]. http://ling.cass.cn/yingyong/courses/nlpbase.htm
[2] 王昊. 基于层次模式匹配的命名实体识别模型[J]. 现代图书情报技术, 2007(5):62-68
[3] Zhou G D, Su J. Named Entity Recognition Using an HMM-based Chunk Tagger[C]. In:Proceedings of the 40th Annual Meeting of the ACL. Philadelphia, PA., USA, 2002:473-480
[4] Settles B. Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets[C]. In:Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Application(NLPBA). Geneva,Switzerland, 2004:103-107
[5] 詹卫东. 词汇分析(二)——从词串到词性标记串[EB/OL]. [2007-10-01]. http://ccl.pku.edu.cn/ doubtfire/course/computational linguistics/contents/Chapter_07_2_pdf_format.pdf.
[6] 钱晶, 张杰, 张涛. 基于最大熵的汉语人名地名识别方法研究[J]. 小型微型计算机系统, 2006, 27(9):1761-1765
[7] 向晓雯. 基于条件随机场的中文命名实体识别[D].厦门:厦门大学,2006.
[8] laputa. 最大熵模型与自然语言处理[EB/OL]. [2007-10-01]. http://www.cs.caltech.edu/~weixl/research/read/summary/MaxEnt2.ppt.
[9] 黄昌宁, 赵海. 由字构词——中文分词新方法[C]. 中国中文信息学会第六次全国会员代表大会暨成立二十五周年学术会议,2006
[10] 郭家清, 蔡东风, 王智超,等.一种基于条件随机场的人名识别[J]. 通讯与计算机,2007,4(2):22-25
[11] CRF++-0.49[CP/OL].[2007-10-01]. http://sourceforge.net |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|