Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (4): 63-71    DOI: 10.11925/infotech.2096-3467.2019.0146
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于PreLM-FT细粒度情感分析的餐饮业用户评论挖掘
沈卓,李艳()
北京林业大学经济管理学院 北京 100083
Mining User Reviews with PreLM-FT Fine-Grain Sentiment Analysis
Shen Zhuo,Li Yan()
School of Economics and Management, Beijing Forestry University, Beijing 100083, China
全文: PDF(891 KB)   HTML ( 7
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 从大量用户评论中分析用户偏好,发现产品或服务的不足并提供改进依据。【方法】 选取大众点评网有关餐饮业的用户评论数据,对大量无监督语料进行预训练;用少量的标签数据微调预训练语言模型;对产品评论中各属性进行情感得分量化,并结合KANO模型分析用户对产品或服务的偏好。【结果】 将餐饮业用户的产品评论数据转化为用户对产品或服务的偏好。【局限】 运用KANO模型时,默认将所有用户对产品某属性的偏好视为一致,导致整体偏好分析不准确。【结论】 采用PreLM-FT细粒度情感分析,能够在仅有少量标签数据的情境下,将用户评论数据转化为用户偏好得分。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
沈卓
李艳
关键词 评论挖掘在线评论情感分析预训练语言模型    
Abstract

[Objective] This paper identifies user preferences based on their reviews of the catering providers, aiming to find and improve the un-satisfactory products or services. [Methods] Firstly, we retrieved user reviews on catering industry from the DianPing website to pre-train unsupervised corpus. Then, we fine-tuned the pre-training language model with a small amount of label data. Finally, we quantified the sentiment scores of attributes from user reviews and combined the KANO model to analyze their preferences for products or services. [Results] We successfully identified user preferences with their reviews. [Limitations] The KANO model might yield some inaccurate overall preference analysis. [Conclusions] The proposed method could effectively reveal user preferences with the help of reviews and some label data.

Key wordsReview Mining    Online Review    Sentiment Analysis    Pre-training Language Model
收稿日期: 2019-02-11     
中图分类号:  C931.6  
通讯作者: 李艳     E-mail: liyan88@bjfu.edu.cn
引用本文:   
沈卓,李艳. 基于PreLM-FT细粒度情感分析的餐饮业用户评论挖掘[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
Shen Zhuo,Li Yan. Mining User Reviews with PreLM-FT Fine-Grain Sentiment Analysis. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2019.0146.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.0146
图1  研究框架
图2  ELMo微调基本框架
项目 总数量 类别数
训练集 105 000 20(带标签)
验证集 15 000 20(带标签)
测试集A 15 000 20(需要预测)
测试集B 200 000 20(需要预测)
表1  数据集数量
模型 F1
fastText 0.545
CNN 0.668
ATAE-LSTM 0.680
GCAE 0.706
表2  基线系统实验结果
训练数据量 模型 微调LM 不微调LM
ELMo 0.408 0.350
1 000条数据 ULMFiT 0.463 0.432
BERT 0.498 0.471
ELMo 0.598 0.410
10 000条数据 ULMFiT 0.628 0.512
BERT 0.687 0.567
ELMo 0.623 0.497
20 000条数据 ULMFiT 0.665 0.536
BERT 0.700 0.631
表3  预训练语言模型微调结果F1值
图3  各模型在不同样本集上的F1值
模型 训练轮数
基线模型 30
ELMo 10
ULMFiT 10
BERT 5
表4  各模型达到最优F1值需要的最少训练轮数
属性 Xi Yi 属性 Xi Yi
菜品-外观 0.629 656 0.489 519 位置-交通是否便利 0.845 822 0.466 609
菜品-分量 0.377 323 0.455 634 其他-本次消费感受 0.597 486 0.673 873
菜品-推荐程度 0.671 923 0.484 807 其他-再次消费的意愿 0.725 646 0.768 228
菜品-口感 0.529 969 0.398 278 价格-性价比 0.639 418 0.495 803
环境-卫生情况 0.652 665 0.637 599 价格-折扣力度 0.448 763 0.480 629
环境-装修情况 0.741 353 0.425 082 价格-价格水平 0.062 375 0.563 576
环境-嘈杂情况 0.636 749 0.574 568 服务-是否容易停车 0.392 034 0.501 802
环境-就餐空间 0.481 213 0.474 076 服务-点菜/上菜速度 0.148 515 0.525 192
位置-距离商圈远近 0.912 399 0.478 381 服务-排队等候时间 0.132 728 0.518 881
位置-是否容易寻找 0.588 691 0.472 365 服务-服务人员态度 0.528 419 0.518 490
表5  各属性满意度
图4  各属性平均满意度的散点图
属性 ri IRadj Ii 属性 ri IRadj Ii
位置-距离商圈远近 1.030 2 1.191 1 0.864 9 价格-价格水平 0.567 0 1.359 5 0.417 1
其他-再次消费的意愿 1.056 8 1.353 8 0.780 6 位置-是否容易寻找 0.754 8 1.935 4 0.390 0
位置-交通是否便利 0.966 0 1.354 6 0.713 1 服务-服务人员态度 0.740 3 1.915 7 0.386 4
环境-卫生情况 0.912 4 1.547 7 0.589 5 环境-就餐空间 0.675 5 2.101 1 0.321 5
其他-本次消费感受 0.900 6 1.578 8 0.570 4 服务-排队等候时间 0.535 6 1.676 3 0.319 5
环境-嘈杂情况 0.857 7 1.649 1 0.520 1 服务-点菜/上菜速度 0.545 8 1.714 8 0.318 3
环境-装修情况 0.854 6 1.685 3 0.507 1 价格-折扣力度 0.657 6 2.113 1 0.311 2
菜品-推荐程度 0.828 6 1.735 1 0.477 5 服务-是否容易停车 0.636 8 2.078 3 0.306 4
价格-性价比 0.809 1 1.780 2 0.454 5 菜品-口感 0.662 9 2.327 7 0.284 8
菜品-外观 0.797 6 1.813 0 0.439 9 菜品-分量 0.591 6 2.241 5 0.263 9
表6  产品各属性的用户满意程度
[1] Terjesen S, Patel P C . In Search of Process Innovations: The Role of Search Depth, Search Breadth, and the Industry Environment[J]. Journal of Management, 2015,43(5):1421-1446.
doi: 10.1177/0149206315575710
[2] 禹献云, 周青 . 外部搜索策略、知识吸收能力与技术创新绩效[J]. 科研管理, 2018,39(8):11-18.
( Yu Xianyun, Zhou Qing . Impact of External Search Tactics and Knowledge Absorptive Capacity on Technological Innovation Performance[J]. Science Research Management, 2018,39(8):11-18.)
[3] Liang R, Guo W, Yang D . Mining Product Problems from Online Feedback of Chinese Users[J]. Kybernetes, 2017,46(3):572-586.
doi: 10.1108/K-03-2016-0048
[4] Netzer O, Feldman R, Goldenberg J , et al. Mine Your Own Business: Market-Structure Surveillance Through Text Mining[J]. Marketing Science, 2012,31(3):521-543.
doi: 10.1287/mksc.1120.0713
[5] 唐晓波, 刘广超 . 细粒度情感分析研究综述[J]. 图书情报工作, 2017,61(5):132-140.
( Tang Xiaobo, Liu Guangchao . Research Review on Fine-grained Sentiment Analysis[J]. Library and Information Service, 2017,61(5):132-140.)
[6] Chen Z, Mukherjee A, Liu B. Aspect Extraction with Automated Prior Knowledge Learning[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2014: 347-358.
[7] Moghaddam S, Ester M. Opinion Digger: An Unsupervised Opinion Miner from Unstructured Product Reviews[C]// Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, 2010: 1825-1828.
[8] 何有世, 何述芳 . 基于领域本体的产品网络口碑信息多层次细粒度情感挖掘[J]. 数据分析与知识发现, 2018,2(8):60-68.
( He Youshi, He Shufang . Sentiment Mining of Online Product Reviews Based on Domain Ontology[J]. Data Analysis and Knowledge Discovery, 2018,2(8):60-68.)
[9] Fan F, Feng Y, Zhao D. Multi-grained Attention Network for Aspect-Level Sentiment Classification[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018: 3433-3442.
[10] Schmitt M, Steinheber S, Schreiber K, et al. Joint Aspect and Polarity Classification for Aspect-Based Sentiment Analysis with End-to-End Neural Networks[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018: 1109-1114.
[11] 余本功, 张培行, 许庆堂 . 基于F-BiGRU情感分析的产品选择方法[J]. 数据分析与知识发现, 2018,2(9):22-30.
( Yu Bengong, Zhang Peixing, Xu Qingtang . Selecting Products Based on F-BiGRU Sentiment Analysis[J]. Data Analysis and Knowledge Discovery, 2018,2(9):22-30.)
[12] Quan C, Ren F . Unsupervised Product Feature Extraction for Feature-Oriented Opinion Determination[J]. Information Sciences, 2014,272:16-28.
doi: 10.1016/j.ins.2014.02.063
[13] Suleman K, Vechtomova O . Discovering Aspects of Online Consumer Reviews[J]. Journal of Information Science, 2015,42(4):492-506.
doi: 10.1177/0165551515595742
[14] Law D, Gruss R, Abrahams A S . Automated Defect Discovery for Dishwasher Appliances from Online Consumer Reviews[J]. Expert Systems with Applications, 2017,67:84-94.
doi: 10.1016/j.eswa.2016.08.069
[15] Guo Y, Barnes S J, Jia Q . Mining Meaning from Online Ratings and Reviews: Tourist Satisfaction Analysis Using Latent Dirichlet Allocation[J]. Tourism Management, 2017,59:467-483.
doi: 10.1016/j.tourman.2016.09.009
[16] Jeong B, Yoon J, Lee J , et al. Social Media Mining for Product Planning: A Product Opportunity Mining Approach Based on Topic Modeling and Sentiment Analysis[J]. International Journal of Information Management, 2019,48:280-290.
doi: 10.1016/j.ijinfomgt.2017.09.009
[17] Fiore A M . The Digital Consumer: Valuable Partner for Product Development and Production[J]. Clothing and Textiles Research Journal, 2008,26(2):177-190.
[18] Bengio Y, Ducharme R, Vincent P , et al. A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003,3:1137-1155.
[19] Merity S, Keskar N S, Socher R. Regularizing and Optimizing LSTM Language Models[C]// Proceedings of the 6th International Conference on Learning Representations. 2018.
[20] Melis G, Dyer C, Blunsom P. On the State of the Art of Evaluation in Neural Language Models[C]// Proceedings of the 6th International Conference on Learning Representations. 2018.
[21] Min S, Seo M J, Hajishirzi H. Question Answering Through Transfer Learning from Large Fine-grained Supervision Data[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2017: 510-517.
[22] Dai A M, Le Q V. Semi-supervised Sequence Learning[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015: 3079-3087.
[23] Howard J, Ruder S. Universal Language Model Fine-tuning for Text Classification[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2018: 328-339.
[24] Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2018: 2227-2237.
[25] Devlin J, Chang M, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 4171-4186.
[26] Li S, Zhao Z, Hu R, et al. Analogical Reasoning on Chinese Morphological and Semantic Relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2018: 138-143.
[27] Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2017: 427-431.
[28] Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2014: 1746-1751.
[29] Wang Y, Huang M, Zhu X, et al. Attention-based LSTM for Aspect-level Sentiment Classification[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2016: 606-615.
[30] Xue W, Li T. Aspect Based Sentiment Analysis with Gated Convolutional Networks[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia. Association for Computational Linguistics, 2018: 2514-2523.
[31] Meng Q, Jiang X. A Method for Rating Customer Requirements' Final Importance in QFD Based on Quantitative Kano Model[C]// Proceedings of the 8th International Conference on Service Systems and Service Management. IEEE, 2011: 1-6.
[1] 姜霖,张麒麟. 基于引文细粒度情感量化的学术评价研究*[J]. 数据分析与知识发现, 2020, 4(6): 129-138.
[2] 石磊,王毅,成颖,魏瑞斌. 自然语言处理中的注意力机制研究综述*[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[3] 李铁军,颜端武,杨雄飞. 基于情感加权关联规则的微博推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[4] 薛福亮,刘丽芳. 一种基于CRF与ATAE-LSTM的细粒度情感分析方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[5] 谭荧,张进,夏立新. 社交媒体情境下的情感分析研究综述[J]. 数据分析与知识发现, 2020, 4(1): 1-11.
[6] 聂卉,何欢. 引入词向量的隐性特征识别研究*[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[7] 李博诚,张云秋,杨铠西. 面向微博商品评论的情感标签抽取研究 *[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[8] 岑咏华,谭志浩,吴承尧. 财经媒介信息对股票市场的影响研究: 基于情感分析的实证 *[J]. 数据分析与知识发现, 2019, 3(9): 98-114.
[9] 卢伟聪,徐健. 基于三分网络的网络用户评论情感分析 *[J]. 数据分析与知识发现, 2019, 3(8): 10-20.
[10] 尤众喜,华薇娜,潘雪莲. 中文分词器对图书评论和情感词典匹配程度的影响 *[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[11] 蒋翠清,郭轶博,刘尧. 基于中文社交媒体文本的领域情感词典构建方法研究*[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
[12] 聂卉. 结合词向量和词图算法的用户兴趣建模研究 *[J]. 数据分析与知识发现, 2019, 3(12): 30-40.
[13] 余本功,张培行,许庆堂. 基于F-BiGRU情感分析的产品选择方法*[J]. 数据分析与知识发现, 2018, 2(9): 22-30.
[14] 曾子明,杨倩雯. 基于LDA和AdaBoost多特征组合的微博情感分析*[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[15] 王秀芳,盛姝,路燕. 一种基于话题聚类及情感强度的微博舆情分析模型*[J]. 数据分析与知识发现, 2018, 2(6): 37-47.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn