Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (12): 42-47     https://doi.org/10.11925/infotech.1003-3513.2015.12.07
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
产品评论中的隐式属性抽取研究
张莉1, 许鑫2
1 南京大学计算机科学与技术系 南京 210093;
2 华东师范大学商学院信息学系 上海 200241
Implicit Feature Identification in Product Reviews
Zhang Li1, Xu Xin2
1 Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China;
2 Department of Information Science, Bussiness School, East China Normal University, Shanghai 200241, China
全文: PDF (503 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的]产品领域的意见挖掘是近年来的一个非常热门的话题, 意见挖掘结果可以帮助过滤有害信息、进行社会舆情分析、指导用户消费和帮助商家改善产品性能等, 而隐式产品属性在网络评论句中十分常见且挖掘难度大, 因此对其进行研究有重要的意义。[方法]利用仅包含显式属性的某品牌汽车评论句确定多词性精简意见词, 并利用同义词词林进行扩展形成意见簇, 同时基于领域常用语确定属性词, 并通过搭配关系计算权重, 生成记录形如“{属性, 意见, 权重}”的字典, 利用多策略隐式属性抽取算法以字典为基础抽取隐式属性, 同时考虑待匹配意见词与字典中的意见词之间的相似度。[结果]可以行之有效地抽取出评论句中的隐式属性, F值达到75.55%, 属于隐式产品属性抽取现有研究的较好结果。[局限]前期数据标注工作主要靠人工, 较为费时费力。[结论]实验结果表明本文算法效果较好, 具有一定的实用价值。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
Abstract

[Objective] Opinion mining in product areas draws more and more attention and becomes a hot research topic. The outcome of opinion mining can be used widely just like harmful information filtering, society opinion analysis, user consumption guidance and product improvement and so on. Implicit feature identification plays an important role because implicit features are common in network comments and the identification of them is difficult. [Methods] This paper uses the comments against a certain automobile brand which only have the explicit features to get refined multi-POS opinions and generate opinion clusters by using Synonyms Forests. Meanwhile identify opinions based on field common phrases. Dictionary in the form of {Feature, Opinion, Weight} is generated by using features and opinions, and the weight is calculated. Then deploy explicitly multi-strategy property extraction algorithm based on a dictionary and consider similarity of the opinions in unmatched comments including implicit features and dictionary. [Results] Implicit features can be extracted effectively and the F-value is 75.55% which reaches the good result of the identification of implicit features. [Limitations] Data labeling is a time-consuming job. [Conclusions] Experiment of the new algorithm shows positive result and has some practical value.

收稿日期: 2015-07-06      出版日期: 2016-04-06
:  TP309  
  G35  
基金资助:

本文系国家社会科学基金项目“基于语言特征的中文意见挖掘研究”(项目编号:11CYY031)的研究成果之一。

通讯作者: 张莉, ORCID: 0000-0002-4934-7166, E-mail: zhl@nju.edu.cn。     E-mail: zhl@nju.edu.cn
作者简介: 作者贡献声明:张莉: 数据采集、清洗和标注, 完成整体框架和实验, 论文起草及最终版本修订; 许鑫: 修改论文, 提出建设性的意见和关键问题的解决思路。
引用本文:   
张莉, 许鑫. 产品评论中的隐式属性抽取研究[J]. 现代图书情报技术, 2015, 31(12): 42-47.
Zhang Li, Xu Xin. Implicit Feature Identification in Product Reviews. New Technology of Library and Information Service, 2015, 31(12): 42-47.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.12.07      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2015/V31/I12/42

[1] Kim S M, Hovy E. Determining the Sentiment of Opinions [C]. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING-04). 2004: 1367-1373.
[2] Hai Z, Chang K, Kim J J. Implicit Feature Identification via Co-occurrence Association Rule Mining [C]. In: Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing'11). 2011:393-404.
[3] Liu B, Hu M, Cheng J. Opinion Observer: Analyzing and Comparing Opinions on the Web [C]. In: Proceedings of the 14th International Conference on World Wide Web. 2005: 342-351.
[4] Zhuang L, Jing F, Zhu X Y. Movie Review Mining and Summarization [C]. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM'06). 2006: 43-50.
[5] Su Q, Xiang K, Wang H, et al. Using Pointwise Mutual Information to Identify Implicit Features in Customer Reviews [C]. In: Proceedings of the 21st International Conference on Computer Processing of Oriental Languages: Beyond the Orient: the Research Challenges Ahead (ICCPOL'06). 2006: 22-30.
[6] Su Q, Xu X, Guo H, et al. Hidden Sentiment Association in Chinese Web Opinion Mining [C]. In: Proceedings of the 17th International Conference on World Wide Web. 2008: 959-968.
[7] Zhang Y, Zhu W. Extracting Implicit Features in Online Customer Reviews for Opinion Mining [C]. In: Proceedings of the 22nd International Conference on World Wide Web. 2013: 103-104.
[8] 仇光, 郑淼, 张晖, 等. 基于正则化主题建模的隐式产品属性抽取[J]. 浙江大学学报: 工学版, 2011, 45(2): 288-294. (Qiu Guang, Zheng Miao, Zhang Hui, et al. Implicit Product Feature Extraction Through Regularized Topic Modeling [J]. Journal of Zhejiang University: Engineering Science, 2011, 45(2): 288-294.)
[9] Poria S, Cambria E, Gelbukh A, et al. A Rule-based Approach to Aspect Extraction from Product Reviews [C]. In: Proceedings of the 2nd Workshop on Natural Language Processing for Social Media (SocialNLP). 2014: 28-37.
[10] Xu H, Zhang F, Wang W. Implicit Feature Identification in Chinese Reviews Using Explicit Topic Mining Model [J]. Knowledge-Based Systems, 2015, 76: 166-175.
[11] 哈工大社会计算与信息检索研究中心同义词词林扩展版[EB/OL]. [2015-06-01]. http://www.ltp-cloud.com/download/. (HIT-SCIR Synonym Word Forest [EB/OL]. [2015-06-01]. http://www.ltp-cloud.com/download/.)
[12] 哈工大语言技术平台LTP [EB/OL]. [2015-06-01]. http:// www.ltp-cloud.com/. (Language Technology Platform [EB/OL]. http://www.ltp-cloud.com/.)
[13] Turney P D. Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL [C]. In: Proceedings of the 12th European Conference on Machine Learning. Springer-Verlag London, 2001: 491-502.

[1] 董振恒,吕学强,任维平,姜阳,李果林. 高性能区块链关键技术研究综述[J]. 数据分析与知识发现, 2021, 5(6): 14-24.
[2] 冷基栋,吕学强,姜阳,李果林. 联盟链共识机制研究综述*[J]. 数据分析与知识发现, 2021, 5(1): 56-65.
[3] 孙辉. DRM体系结构研究*[J]. 现代图书情报技术, 2007, 2(12): 45-49.
[4] 孙辉. DRM中的分发控制研究[J]. 现代图书情报技术, 2007, 2(9): 34-39.
[5] 王斐,王凤英. 基于可信度和UCON的资源分发研究与应用*[J]. 现代图书情报技术, 2007, 2(9): 40-43.
[6] 田丰,孙辉 . 基于角色的访问控制技术在国防科技信息安全管理中的应用[J]. 现代图书情报技术, 2007, 2(2): 75-77.
[7] 钱旭,顾巍,陈凌晖,丁晓峰 . 网络蠕虫检测系统的设计和实现[J]. 现代图书情报技术, 2007, 2(1): 44-48.
[8] 李宇,唐俊. 数字图书馆数据备份及容灾[J]. 现代图书情报技术, 2006, 1(2): 83-87.
[9] 戚爱华,刘友华,刘宇松. XML加密的特点及应用模式[J]. 现代图书情报技术, 2005, 21(5): 73-75.
[10] 洪丹萍. 关于ILASII数据备份与恢复的探讨[J]. 现代图书情报技术, 2002, 18(4): 91-93.
[11] 宋晓雯. 我国的网络信息安全[J]. 现代图书情报技术, 2002, 18(1): 53-55.
[12] 张晓林. 数字权益管理技术[J]. 现代图书情报技术, 2001, 17(5): 3-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn