Please wait a minute...
New Technology of Library and Information Service  2007, Vol. 2 Issue (12): 57-63    DOI: 10.11925/infotech.1003-3513.2007.12.12
Current Issue | Archive | Adv Search |
Comparative Study on HMM and CRFs Applying in Information Extraction
Wang Hao  Deng Sanhong
(Department of Information Management, Nanjing University,Nanjing 210093,China)
Download: PDF(931 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper brings forward two models for person-name entity extraction based on the comparison of math theory between HMM and CRFs, one using word role label based HMM and the other using character role label based CRFs, then validates and compares the effect of both by open-testing and applying in practice, and thereby proves in practice that CRFs is fitter for sequence labeling and object classifying than HMM.

Key wordsHMM      CRFs      Information extraction      Person-name entity extraction      Role label      Feature     
Received: 11 October 2007      Published: 25 December 2007
: 

TP311

 
Corresponding Authors: Wang Hao     E-mail: ywhaowang810710@sina.com
About author:: Wang Hao,Deng Sanhong

Cite this article:

Wang Hao,Deng Sanhong. Comparative Study on HMM and CRFs Applying in Information Extraction. New Technology of Library and Information Service, 2007, 2(12): 57-63.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2007.12.12     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2007/V2/I12/57

[1] 傅爱平. 计算语言学和自然语言信息处理研究和应用综述[EB/OL].[2007-10-01]. http://ling.cass.cn/yingyong/courses/nlpbase.htm
[2] 王昊. 基于层次模式匹配的命名实体识别模型[J]. 现代图书情报技术, 2007(5):62-68
[3] Zhou G D, Su J. Named Entity Recognition Using an HMM-based Chunk Tagger[C]. In:Proceedings of the 40th Annual Meeting of the ACL. Philadelphia, PA., USA, 2002:473-480
[4] Settles B. Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets[C]. In:Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine  and its Application(NLPBA). Geneva,Switzerland, 2004:103-107
[5] 詹卫东. 词汇分析(二)——从词串到词性标记串[EB/OL]. [2007-10-01]. http://ccl.pku.edu.cn/ doubtfire/course/computational linguistics/contents/Chapter_07_2_pdf_format.pdf.
[6] 钱晶, 张杰, 张涛. 基于最大熵的汉语人名地名识别方法研究[J]. 小型微型计算机系统, 2006, 27(9):1761-1765
[7] 向晓雯. 基于条件随机场的中文命名实体识别[D].厦门:厦门大学,2006.
[8] laputa. 最大熵模型与自然语言处理[EB/OL]. [2007-10-01]. http://www.cs.caltech.edu/~weixl/research/read/summary/MaxEnt2.ppt.
[9] 黄昌宁, 赵海. 由字构词——中文分词新方法[C]. 中国中文信息学会第六次全国会员代表大会暨成立二十五周年学术会议,2006
[10] 郭家清, 蔡东风, 王智超,等.一种基于条件随机场的人名识别[J]. 通讯与计算机,2007,4(2):22-25
[11] CRF++-0.49[CP/OL].[2007-10-01]. http://sourceforge.net

[1] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[2] Jiaming Liang,Jie Zhao,Zhou Jianlong,Zhenning Dong. Detecting Collusive Fraudulent Online Transaction with Implicit User Behaviors[J]. 数据分析与知识发现, 2019, 3(5): 125-138.
[3] Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[4] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[5] Sisi Gui,Wei Lu,Xiaojuan Zhang. Temporal Intent Classification with Query Expression Feature[J]. 数据分析与知识发现, 2019, 3(3): 66-75.
[6] Zhanglu Tan,Zhaogang Wang,Han Hu. Study on a Method of Feature Classification Selection Based on χ2 Statistics[J]. 数据分析与知识发现, 2019, 3(2): 72-78.
[7] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[8] Hui Li,Yaqing Chai. Fine-Grained Sentiment Analysis Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2019, 3(1): 95-103.
[9] Youshi He,Shufang He. Sentiment Mining of Online Product Reviews Based on Domain Ontology[J]. 数据分析与知识发现, 2018, 2(8): 60-68.
[10] Dongmei Mu,Shan Jin,Yuanhong Ju. Finding Association Between Diseases and Genes from Literature Abstracts[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[11] Cheng Zhou,Hongqin Wei. Identifying Crowd Participants with Modified Random Forests Algorithm[J]. 数据分析与知识发现, 2018, 2(7): 46-54.
[12] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[13] Tingting Zhang,Yuxiang Zhao,Qinghua Zhu. Mining User Preferences in Crowdsourcing Community with Sensitivity Analysis[J]. 数据分析与知识发现, 2018, 2(5): 23-31.
[14] Tingxin Wen,Yangzi Li,Jingshuang Sun. Extracting Text Features with Improved Fruit Fly Optimization Algorithm[J]. 数据分析与知识发现, 2018, 2(5): 59-69.
[15] Lixin Zhou,Jie Lin. Extracting Product Features with NodeRank Algorithm[J]. 数据分析与知识发现, 2018, 2(4): 90-98.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn