Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (4): 63-71     https://doi.org/10.11925/infotech.2096-3467.2019.0146
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于PreLM-FT细粒度情感分析的餐饮业用户评论挖掘
沈卓,李艳()
北京林业大学经济管理学院 北京 100083
Mining User Reviews with PreLM-FT Fine-Grain Sentiment Analysis
Shen Zhuo,Li Yan()
School of Economics and Management, Beijing Forestry University, Beijing 100083, China
全文: PDF (891 KB)   HTML ( 15
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 从大量用户评论中分析用户偏好,发现产品或服务的不足并提供改进依据。【方法】 选取大众点评网有关餐饮业的用户评论数据,对大量无监督语料进行预训练;用少量的标签数据微调预训练语言模型;对产品评论中各属性进行情感得分量化,并结合KANO模型分析用户对产品或服务的偏好。【结果】 将餐饮业用户的产品评论数据转化为用户对产品或服务的偏好。【局限】 运用KANO模型时,默认将所有用户对产品某属性的偏好视为一致,导致整体偏好分析不准确。【结论】 采用PreLM-FT细粒度情感分析,能够在仅有少量标签数据的情境下,将用户评论数据转化为用户偏好得分。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
沈卓
李艳
关键词 评论挖掘在线评论情感分析预训练语言模型    
Abstract

[Objective] This paper identifies user preferences based on their reviews of the catering providers, aiming to find and improve the un-satisfactory products or services. [Methods] Firstly, we retrieved user reviews on catering industry from the DianPing website to pre-train unsupervised corpus. Then, we fine-tuned the pre-training language model with a small amount of label data. Finally, we quantified the sentiment scores of attributes from user reviews and combined the KANO model to analyze their preferences for products or services. [Results] We successfully identified user preferences with their reviews. [Limitations] The KANO model might yield some inaccurate overall preference analysis. [Conclusions] The proposed method could effectively reveal user preferences with the help of reviews and some label data.

Key wordsReview Mining    Online Review    Sentiment Analysis    Pre-training Language Model
收稿日期: 2019-02-11      出版日期: 2020-06-01
ZTFLH:  C931.6  
通讯作者: 李艳     E-mail: liyan88@bjfu.edu.cn
引用本文:   
沈卓,李艳. 基于PreLM-FT细粒度情感分析的餐饮业用户评论挖掘[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
Shen Zhuo,Li Yan. Mining User Reviews with PreLM-FT Fine-Grain Sentiment Analysis. Data Analysis and Knowledge Discovery, 2020, 4(4): 63-71.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.0146      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I4/63
Fig.1  研究框架
Fig.2  ELMo微调基本框架
项目 总数量 类别数
训练集 105 000 20(带标签)
验证集 15 000 20(带标签)
测试集A 15 000 20(需要预测)
测试集B 200 000 20(需要预测)
Table 1  数据集数量
模型 F1
fastText 0.545
CNN 0.668
ATAE-LSTM 0.680
GCAE 0.706
Table 2  基线系统实验结果
训练数据量 模型 微调LM 不微调LM
ELMo 0.408 0.350
1 000条数据 ULMFiT 0.463 0.432
BERT 0.498 0.471
ELMo 0.598 0.410
10 000条数据 ULMFiT 0.628 0.512
BERT 0.687 0.567
ELMo 0.623 0.497
20 000条数据 ULMFiT 0.665 0.536
BERT 0.700 0.631
Table 3  预训练语言模型微调结果F1值
Fig.3  各模型在不同样本集上的F1值
模型 训练轮数
基线模型 30
ELMo 10
ULMFiT 10
BERT 5
Table 4  各模型达到最优F1值需要的最少训练轮数
属性 Xi Yi 属性 Xi Yi
菜品-外观 0.629 656 0.489 519 位置-交通是否便利 0.845 822 0.466 609
菜品-分量 0.377 323 0.455 634 其他-本次消费感受 0.597 486 0.673 873
菜品-推荐程度 0.671 923 0.484 807 其他-再次消费的意愿 0.725 646 0.768 228
菜品-口感 0.529 969 0.398 278 价格-性价比 0.639 418 0.495 803
环境-卫生情况 0.652 665 0.637 599 价格-折扣力度 0.448 763 0.480 629
环境-装修情况 0.741 353 0.425 082 价格-价格水平 0.062 375 0.563 576
环境-嘈杂情况 0.636 749 0.574 568 服务-是否容易停车 0.392 034 0.501 802
环境-就餐空间 0.481 213 0.474 076 服务-点菜/上菜速度 0.148 515 0.525 192
位置-距离商圈远近 0.912 399 0.478 381 服务-排队等候时间 0.132 728 0.518 881
位置-是否容易寻找 0.588 691 0.472 365 服务-服务人员态度 0.528 419 0.518 490
Table 5  各属性满意度
Fig.4  各属性平均满意度的散点图
属性 ri IRadj Ii 属性 ri IRadj Ii
位置-距离商圈远近 1.030 2 1.191 1 0.864 9 价格-价格水平 0.567 0 1.359 5 0.417 1
其他-再次消费的意愿 1.056 8 1.353 8 0.780 6 位置-是否容易寻找 0.754 8 1.935 4 0.390 0
位置-交通是否便利 0.966 0 1.354 6 0.713 1 服务-服务人员态度 0.740 3 1.915 7 0.386 4
环境-卫生情况 0.912 4 1.547 7 0.589 5 环境-就餐空间 0.675 5 2.101 1 0.321 5
其他-本次消费感受 0.900 6 1.578 8 0.570 4 服务-排队等候时间 0.535 6 1.676 3 0.319 5
环境-嘈杂情况 0.857 7 1.649 1 0.520 1 服务-点菜/上菜速度 0.545 8 1.714 8 0.318 3
环境-装修情况 0.854 6 1.685 3 0.507 1 价格-折扣力度 0.657 6 2.113 1 0.311 2
菜品-推荐程度 0.828 6 1.735 1 0.477 5 服务-是否容易停车 0.636 8 2.078 3 0.306 4
价格-性价比 0.809 1 1.780 2 0.454 5 菜品-口感 0.662 9 2.327 7 0.284 8
菜品-外观 0.797 6 1.813 0 0.439 9 菜品-分量 0.591 6 2.241 5 0.263 9
Table 6  产品各属性的用户满意程度
[1] Terjesen S, Patel P C . In Search of Process Innovations: The Role of Search Depth, Search Breadth, and the Industry Environment[J]. Journal of Management, 2015,43(5):1421-1446.
doi: 10.1177/0149206315575710
[2] 禹献云, 周青 . 外部搜索策略、知识吸收能力与技术创新绩效[J]. 科研管理, 2018,39(8):11-18.
[2] ( Yu Xianyun, Zhou Qing . Impact of External Search Tactics and Knowledge Absorptive Capacity on Technological Innovation Performance[J]. Science Research Management, 2018,39(8):11-18.)
[3] Liang R, Guo W, Yang D . Mining Product Problems from Online Feedback of Chinese Users[J]. Kybernetes, 2017,46(3):572-586.
doi: 10.1108/K-03-2016-0048
[4] Netzer O, Feldman R, Goldenberg J , et al. Mine Your Own Business: Market-Structure Surveillance Through Text Mining[J]. Marketing Science, 2012,31(3):521-543.
doi: 10.1287/mksc.1120.0713
[5] 唐晓波, 刘广超 . 细粒度情感分析研究综述[J]. 图书情报工作, 2017,61(5):132-140.
[5] ( Tang Xiaobo, Liu Guangchao . Research Review on Fine-grained Sentiment Analysis[J]. Library and Information Service, 2017,61(5):132-140.)
[6] Chen Z, Mukherjee A, Liu B. Aspect Extraction with Automated Prior Knowledge Learning[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2014: 347-358.
[7] Moghaddam S, Ester M. Opinion Digger: An Unsupervised Opinion Miner from Unstructured Product Reviews[C]// Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, 2010: 1825-1828.
[8] 何有世, 何述芳 . 基于领域本体的产品网络口碑信息多层次细粒度情感挖掘[J]. 数据分析与知识发现, 2018,2(8):60-68.
[8] ( He Youshi, He Shufang . Sentiment Mining of Online Product Reviews Based on Domain Ontology[J]. Data Analysis and Knowledge Discovery, 2018,2(8):60-68.)
[9] Fan F, Feng Y, Zhao D. Multi-grained Attention Network for Aspect-Level Sentiment Classification[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018: 3433-3442.
[10] Schmitt M, Steinheber S, Schreiber K, et al. Joint Aspect and Polarity Classification for Aspect-Based Sentiment Analysis with End-to-End Neural Networks[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018: 1109-1114.
[11] 余本功, 张培行, 许庆堂 . 基于F-BiGRU情感分析的产品选择方法[J]. 数据分析与知识发现, 2018,2(9):22-30.
[11] ( Yu Bengong, Zhang Peixing, Xu Qingtang . Selecting Products Based on F-BiGRU Sentiment Analysis[J]. Data Analysis and Knowledge Discovery, 2018,2(9):22-30.)
[12] Quan C, Ren F . Unsupervised Product Feature Extraction for Feature-Oriented Opinion Determination[J]. Information Sciences, 2014,272:16-28.
doi: 10.1016/j.ins.2014.02.063
[13] Suleman K, Vechtomova O . Discovering Aspects of Online Consumer Reviews[J]. Journal of Information Science, 2015,42(4):492-506.
doi: 10.1177/0165551515595742
[14] Law D, Gruss R, Abrahams A S . Automated Defect Discovery for Dishwasher Appliances from Online Consumer Reviews[J]. Expert Systems with Applications, 2017,67:84-94.
doi: 10.1016/j.eswa.2016.08.069
[15] Guo Y, Barnes S J, Jia Q . Mining Meaning from Online Ratings and Reviews: Tourist Satisfaction Analysis Using Latent Dirichlet Allocation[J]. Tourism Management, 2017,59:467-483.
doi: 10.1016/j.tourman.2016.09.009
[16] Jeong B, Yoon J, Lee J , et al. Social Media Mining for Product Planning: A Product Opportunity Mining Approach Based on Topic Modeling and Sentiment Analysis[J]. International Journal of Information Management, 2019,48:280-290.
doi: 10.1016/j.ijinfomgt.2017.09.009
[17] Fiore A M . The Digital Consumer: Valuable Partner for Product Development and Production[J]. Clothing and Textiles Research Journal, 2008,26(2):177-190.
[18] Bengio Y, Ducharme R, Vincent P , et al. A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003,3:1137-1155.
[19] Merity S, Keskar N S, Socher R. Regularizing and Optimizing LSTM Language Models[C]// Proceedings of the 6th International Conference on Learning Representations. 2018.
[20] Melis G, Dyer C, Blunsom P. On the State of the Art of Evaluation in Neural Language Models[C]// Proceedings of the 6th International Conference on Learning Representations. 2018.
[21] Min S, Seo M J, Hajishirzi H. Question Answering Through Transfer Learning from Large Fine-grained Supervision Data[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2017: 510-517.
[22] Dai A M, Le Q V. Semi-supervised Sequence Learning[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015: 3079-3087.
[23] Howard J, Ruder S. Universal Language Model Fine-tuning for Text Classification[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2018: 328-339.
[24] Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2018: 2227-2237.
[25] Devlin J, Chang M, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 4171-4186.
[26] Li S, Zhao Z, Hu R, et al. Analogical Reasoning on Chinese Morphological and Semantic Relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2018: 138-143.
[27] Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2017: 427-431.
[28] Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2014: 1746-1751.
[29] Wang Y, Huang M, Zhu X, et al. Attention-based LSTM for Aspect-level Sentiment Classification[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2016: 606-615.
[30] Xue W, Li T. Aspect Based Sentiment Analysis with Gated Convolutional Networks[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia. Association for Computational Linguistics, 2018: 2514-2523.
[31] Meng Q, Jiang X. A Method for Rating Customer Requirements' Final Importance in QFD Based on Quantitative Kano Model[C]// Proceedings of the 8th International Conference on Service Systems and Service Management. IEEE, 2011: 1-6.
[1] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[2] 王义真,欧石燕,陈金菊. 民事裁判文书两阶段式自动摘要研究*[J]. 数据分析与知识发现, 2021, 5(5): 104-114.
[3] 刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[4] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[5] 常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[6] 张梦瑶, 朱广丽, 张顺香, 张标. 基于情感分析的微博热点话题用户群体划分模型 *[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[7] 韩普, 张伟, 张展鹏, 王宇欣, 方浩宇. 基于特征融合和多通道的突发公共卫生事件微博情感分析*[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[8] 王伟, 高宁, 徐玉婷, 王洪伟. 基于LDA的众筹项目在线评论主题动态演化分析*[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[9] 吕华揆,刘政昊,钱宇星,洪旭东. 异质性财经新闻与股市关系研究*[J]. 数据分析与知识发现, 2021, 5(1): 99-111.
[10] 徐红霞,于倩倩,钱力. 基于主题模型和情感分析的话题交互数据观点对抗性分析 *[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[11] 姜霖,张麒麟. 基于引文细粒度情感量化的学术评价研究*[J]. 数据分析与知识发现, 2020, 4(6): 129-138.
[12] 石磊,王毅,成颖,魏瑞斌. 自然语言处理中的注意力机制研究综述*[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[13] 李铁军,颜端武,杨雄飞. 基于情感加权关联规则的微博推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[14] 薛福亮,刘丽芳. 一种基于CRF与ATAE-LSTM的细粒度情感分析方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[15] 张翼鹏,马敬东. 突发公共卫生事件误导信息受众情感分析及传播特征研究*[J]. 数据分析与知识发现, 2020, 4(12): 45-54.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn