Please wait a minute...
Advanced Search
现代图书情报技术  2011, Vol. 27 Issue (5): 77-82    DOI: 10.11925/infotech.1003-3513.2011.05.12
  应用实践 本期目录 | 过刊浏览 | 高级检索 |
从客户评论中识别命名实体——基于最大熵模型的实现
余传明1,2, 黄建秋2, 郭飞2
1. 中南财经政法大学信息与安全工程学院 武汉 430073;
2. 上海理工大学管理学院 上海 200093
Recognizing Named Entity from Free-text Customer Reviews——A Maximum Entropy Model-based Approach
Yu Chuanming1,2, Huang Jianqiu2, Guo Fei2
1. School of Information Safety and Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China;
2. Business School, University of Shanghai for Science and Technology, Shanghai 200093, China
全文: PDF(921 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 介绍命名实体识别的基本概念,分析两种命名实体识别的基本方法:基于规则的命名实体识别方法和基于统计的命名实体识别方法,并以最大熵模型为理论基础,对中文菜名识别进行实证研究。根据中文命名实体的特点,设计6种特征模板。实验结果表明,在简单特征模板的基础上增加标注特征能有效提高命名实体的识别效果。对改进识别效果有用的特征依次为:标注特征、词性组合特征、后向词性依赖特征和词形特征。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
余传明
黄建秋
郭飞
关键词 命名实体识别最大熵模型客户评论文本挖掘    
Abstract:This paper introduces the concept of Named Entity Recognition (NER), analyzes two basic approaches, the rule-based approach and the statistical approach, and conducts an empirical study on Chinese dish name recognition based on the theory of Maximum Entropy Model (MEM). According to the characteristics of Chinese named entity, 6 feature templates are designed. Experimental results show that adding tagging features to the basic simple feature template can efficiently improve the performance of Named Entity Recognition. The features in order to improve recognition performance are as follow: tagging features, combination of POS features, forward POS dependency features and word form features.
Key wordsNamed entity recognition    Maximum entropy model    User reviews    Text mining
收稿日期: 2011-04-28     
: 

TP391

 
引用本文:   
余传明, 黄建秋, 郭飞. 从客户评论中识别命名实体——基于最大熵模型的实现[J]. 现代图书情报技术, 2011, 27(5): 77-82.
Yu Chuanming, Huang Jianqiu, Guo Fei. Recognizing Named Entity from Free-text Customer Reviews——A Maximum Entropy Model-based Approach. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2011.05.12.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2011.05.12
[1] Grishman R, Sundheim B. Message Understanding Conference-6: A Brief History . In: Proceedings of the 16th International Conference on Computational Linguistics (COLING), Kopenhagen.1996:466-471.

[2] Srihari R K, Li W, Cornell T, et al. InfoXtract: A Customizable Intermediate Level Information Extraction Engine [J]. Journal of Natural Language Engineering, 2008, 14(1): 33-69.

[3] Hirschman L, Gaizauskas R. Natural Language Question Answering:The View from Here [J]. Journal of Natural Language Engineering, 2001, 7(4):275-300.

[4] Frost R A, Hafiz R, Callaghan P. Parser Combinators for Ambiguous Left-Recursive Grammars . In: Proceedings of the 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN, San Francisco.2008,4902: 167-181.

[5] Geer D. Statistical Translation Gains Respect [J]. IEEE Computer, 2005,38(10):18-21.

[6] Halpin H, Robu V, Shepherd H. The Complex Dynamics of Collaborative Tagging . In: Proceedings of the 16th International Conference on the World Wide Web (WWW'07), Banff, Canada. New York, NY, USA:ACM Press, 2007:211-220.

[7] Manning C D, Schütze H. Foundations of Statistical Natural Language Processing [M]. 1st Edition. MIT Press, 1999.

[8] Farmakiotou D, Karkaletsis V, Koutsias J, et al. Rule-based Named Entity Recognition for Greek Financial Texts . In: Proceedings of the Workshop on Computational Lexicography and Multimedia Dictionaries. 2000:75-78.

[9] 李楠,郑荣廷,吉久明,等.基于启发式规则的中文化学物质命名识别研究[J]. 现代图书情报技术 ,2010(5):13-17.

[10] Yang T. Computational Verb Decision Trees [J]. International Journal of Computational Cognition, 2006, 4 (4): 34-46.

[11] Bechet F, Nasr A, Genet F. Tagging Unknown Proper Names Using Decision Trees . In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, HongKong, China.2000:77-84.

[12] Rabiner L R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition .// Waibel A, Lee K F. Readings in Speech Recognition[M]. San Francisco, CA, USA:Morgan Kaufmann Publishers Inc., 1990: 267-296.

[13] Zhou G, Su J. Named Entity Recognition Using an HMM-based Chunk Tagger . In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.2002:473-480.

[14] Uffink J. Can the Maximum Entropy Principle be Explained as a Consistency Requirement? [J]. Studies in History and Philosophy of Modern Physics, 1995, 26(3): 223-261.

[15] Borthwick A E. A Maximum Entropy Approach to Named Entity Recognition . New York, NY, USA:New York University, 1999.

[16] Moens M F. Information Extraction: Algorithms and Prospects in a Retrieval Context [M]. New York: Springer, 2006: 105-106.

[17] Berger A L, Pietra V J D, Pietra S A D. A Maximum Entropy Approach to Natural Language Processing [J]. Computational Linguistics, 1996, 22(1):39-71.

[18] 曲晓棠, 沈晓红. 基于最大熵模型的中文命名实体识别研究[J]. 科技信息 ,2008(30):15-17.
[1] 杨亚楠,赵文辉,张健,谭珅,张贝贝. 基于多视图协同的政策文本可视化研究*[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[2] 黄菡,王宏宇,王晓光. 结合主动学习的条件随机场模型用于法律术语的自动识别*[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[3] 张梦吉,杜婉钰,郑楠. 引入新闻短文本的个股走势预测模型[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[4] 余丽,钱力,付常雷,赵华茗. 基于深度学习的文本中细粒度知识元抽取方法研究*[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[5] 唐慧慧,王昊,张紫玄,王雪颖. 基于汉字标注的中文历史事件名抽取研究*[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[6] 张宁,尹乐民,何立峰. 网络股评“发布者-关注者”BSI与股票市场关联性研究*[J]. 数据分析与知识发现, 2018, 2(6): 1-12.
[7] 范馨月,崔雷. 基于文本挖掘的药物副作用知识发现研究[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[8] 汪强兵,章成志. 融合内容与用户手势行为的用户画像构建系统设计与实现*[J]. 数据分析与知识发现, 2017, 1(2): 80-86.
[9] 谢秀芳,张晓林. 针对科技路线图的文本挖掘研究: 集成分析及可视化*[J]. 数据分析与知识发现, 2017, 1(1): 16-25.
[10] 姚兆旭,马静. 面向微博话题的“主题+观点”词条抽取算法研究*[J]. 现代图书情报技术, 2016, 32(7-8): 78-86.
[11] 兰秋军,刘文星,李卫康,胡星野. 融合句法信息的金融论坛文本情感计算研究*[J]. 现代图书情报技术, 2016, 32(4): 64-71.
[12] 毕强, 刘健, 鲍玉来. 基于语义相似度的文本聚类研究*[J]. 数据分析与知识发现, 2016, 32(12): 9-16.
[13] 林园园,战洪飞,余军合,李长江,张凡. 基于产品评论的消费者情感波动分析模型构建及实证研究*[J]. 现代图书情报技术, 2016, 32(11): 44-53.
[14] 隋明爽,崔雷. 结合多种特征的CRF模型用于化学物质-疾病命名实体识别[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[15] 杨如意,刘东苏,李慧. 一种融合外部特征的改进主题模型*[J]. 现代图书情报技术, 2016, 32(1): 48-54.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn