Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (4): 63-71    DOI: 10.11925/infotech.2096-3467.2019.0146
Current Issue | Archive | Adv Search |
Mining User Reviews with PreLM-FT Fine-Grain Sentiment Analysis
Shen Zhuo,Li Yan()
School of Economics and Management, Beijing Forestry University, Beijing 100083, China
Download: PDF (891 KB)   HTML ( 15
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper identifies user preferences based on their reviews of the catering providers, aiming to find and improve the un-satisfactory products or services. [Methods] Firstly, we retrieved user reviews on catering industry from the DianPing website to pre-train unsupervised corpus. Then, we fine-tuned the pre-training language model with a small amount of label data. Finally, we quantified the sentiment scores of attributes from user reviews and combined the KANO model to analyze their preferences for products or services. [Results] We successfully identified user preferences with their reviews. [Limitations] The KANO model might yield some inaccurate overall preference analysis. [Conclusions] The proposed method could effectively reveal user preferences with the help of reviews and some label data.

Key wordsReview Mining      Online Review      Sentiment Analysis      Pre-training Language Model     
Received: 11 February 2019      Published: 01 June 2020
ZTFLH:  C931.6  
Corresponding Authors: Li Yan     E-mail: liyan88@bjfu.edu.cn

Cite this article:

Shen Zhuo,Li Yan. Mining User Reviews with PreLM-FT Fine-Grain Sentiment Analysis. Data Analysis and Knowledge Discovery, 2020, 4(4): 63-71.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0146     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I4/63

Research Framework
Basic Framework for Fine-tuning of ELMo
项目 总数量 类别数
训练集 105 000 20(带标签)
验证集 15 000 20(带标签)
测试集A 15 000 20(需要预测)
测试集B 200 000 20(需要预测)
Number of Data Sets
模型 F1
fastText 0.545
CNN 0.668
ATAE-LSTM 0.680
GCAE 0.706
Baseline System Experiment Results
训练数据量 模型 微调LM 不微调LM
ELMo 0.408 0.350
1 000条数据 ULMFiT 0.463 0.432
BERT 0.498 0.471
ELMo 0.598 0.410
10 000条数据 ULMFiT 0.628 0.512
BERT 0.687 0.567
ELMo 0.623 0.497
20 000条数据 ULMFiT 0.665 0.536
BERT 0.700 0.631
F1 of PreLM-FT Experiment
The Best F1 of Each Model
模型 训练轮数
基线模型 30
ELMo 10
ULMFiT 10
BERT 5
The Minimum of Epochs for Best F1
属性 Xi Yi 属性 Xi Yi
菜品-外观 0.629 656 0.489 519 位置-交通是否便利 0.845 822 0.466 609
菜品-分量 0.377 323 0.455 634 其他-本次消费感受 0.597 486 0.673 873
菜品-推荐程度 0.671 923 0.484 807 其他-再次消费的意愿 0.725 646 0.768 228
菜品-口感 0.529 969 0.398 278 价格-性价比 0.639 418 0.495 803
环境-卫生情况 0.652 665 0.637 599 价格-折扣力度 0.448 763 0.480 629
环境-装修情况 0.741 353 0.425 082 价格-价格水平 0.062 375 0.563 576
环境-嘈杂情况 0.636 749 0.574 568 服务-是否容易停车 0.392 034 0.501 802
环境-就餐空间 0.481 213 0.474 076 服务-点菜/上菜速度 0.148 515 0.525 192
位置-距离商圈远近 0.912 399 0.478 381 服务-排队等候时间 0.132 728 0.518 881
位置-是否容易寻找 0.588 691 0.472 365 服务-服务人员态度 0.528 419 0.518 490
Satisfaction of Each Attribute
Scatterplot of Average Satisfaction of Various Attributes
属性 ri IRadj Ii 属性 ri IRadj Ii
位置-距离商圈远近 1.030 2 1.191 1 0.864 9 价格-价格水平 0.567 0 1.359 5 0.417 1
其他-再次消费的意愿 1.056 8 1.353 8 0.780 6 位置-是否容易寻找 0.754 8 1.935 4 0.390 0
位置-交通是否便利 0.966 0 1.354 6 0.713 1 服务-服务人员态度 0.740 3 1.915 7 0.386 4
环境-卫生情况 0.912 4 1.547 7 0.589 5 环境-就餐空间 0.675 5 2.101 1 0.321 5
其他-本次消费感受 0.900 6 1.578 8 0.570 4 服务-排队等候时间 0.535 6 1.676 3 0.319 5
环境-嘈杂情况 0.857 7 1.649 1 0.520 1 服务-点菜/上菜速度 0.545 8 1.714 8 0.318 3
环境-装修情况 0.854 6 1.685 3 0.507 1 价格-折扣力度 0.657 6 2.113 1 0.311 2
菜品-推荐程度 0.828 6 1.735 1 0.477 5 服务-是否容易停车 0.636 8 2.078 3 0.306 4
价格-性价比 0.809 1 1.780 2 0.454 5 菜品-口感 0.662 9 2.327 7 0.284 8
菜品-外观 0.797 6 1.813 0 0.439 9 菜品-分量 0.591 6 2.241 5 0.263 9
Customer Satisfaction of Each Attribute
[1] Terjesen S, Patel P C . In Search of Process Innovations: The Role of Search Depth, Search Breadth, and the Industry Environment[J]. Journal of Management, 2015,43(5):1421-1446.
doi: 10.1177/0149206315575710
[2] 禹献云, 周青 . 外部搜索策略、知识吸收能力与技术创新绩效[J]. 科研管理, 2018,39(8):11-18.
[2] ( Yu Xianyun, Zhou Qing . Impact of External Search Tactics and Knowledge Absorptive Capacity on Technological Innovation Performance[J]. Science Research Management, 2018,39(8):11-18.)
[3] Liang R, Guo W, Yang D . Mining Product Problems from Online Feedback of Chinese Users[J]. Kybernetes, 2017,46(3):572-586.
doi: 10.1108/K-03-2016-0048
[4] Netzer O, Feldman R, Goldenberg J , et al. Mine Your Own Business: Market-Structure Surveillance Through Text Mining[J]. Marketing Science, 2012,31(3):521-543.
doi: 10.1287/mksc.1120.0713
[5] 唐晓波, 刘广超 . 细粒度情感分析研究综述[J]. 图书情报工作, 2017,61(5):132-140.
[5] ( Tang Xiaobo, Liu Guangchao . Research Review on Fine-grained Sentiment Analysis[J]. Library and Information Service, 2017,61(5):132-140.)
[6] Chen Z, Mukherjee A, Liu B. Aspect Extraction with Automated Prior Knowledge Learning[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2014: 347-358.
[7] Moghaddam S, Ester M. Opinion Digger: An Unsupervised Opinion Miner from Unstructured Product Reviews[C]// Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, 2010: 1825-1828.
[8] 何有世, 何述芳 . 基于领域本体的产品网络口碑信息多层次细粒度情感挖掘[J]. 数据分析与知识发现, 2018,2(8):60-68.
[8] ( He Youshi, He Shufang . Sentiment Mining of Online Product Reviews Based on Domain Ontology[J]. Data Analysis and Knowledge Discovery, 2018,2(8):60-68.)
[9] Fan F, Feng Y, Zhao D. Multi-grained Attention Network for Aspect-Level Sentiment Classification[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018: 3433-3442.
[10] Schmitt M, Steinheber S, Schreiber K, et al. Joint Aspect and Polarity Classification for Aspect-Based Sentiment Analysis with End-to-End Neural Networks[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018: 1109-1114.
[11] 余本功, 张培行, 许庆堂 . 基于F-BiGRU情感分析的产品选择方法[J]. 数据分析与知识发现, 2018,2(9):22-30.
[11] ( Yu Bengong, Zhang Peixing, Xu Qingtang . Selecting Products Based on F-BiGRU Sentiment Analysis[J]. Data Analysis and Knowledge Discovery, 2018,2(9):22-30.)
[12] Quan C, Ren F . Unsupervised Product Feature Extraction for Feature-Oriented Opinion Determination[J]. Information Sciences, 2014,272:16-28.
doi: 10.1016/j.ins.2014.02.063
[13] Suleman K, Vechtomova O . Discovering Aspects of Online Consumer Reviews[J]. Journal of Information Science, 2015,42(4):492-506.
doi: 10.1177/0165551515595742
[14] Law D, Gruss R, Abrahams A S . Automated Defect Discovery for Dishwasher Appliances from Online Consumer Reviews[J]. Expert Systems with Applications, 2017,67:84-94.
doi: 10.1016/j.eswa.2016.08.069
[15] Guo Y, Barnes S J, Jia Q . Mining Meaning from Online Ratings and Reviews: Tourist Satisfaction Analysis Using Latent Dirichlet Allocation[J]. Tourism Management, 2017,59:467-483.
doi: 10.1016/j.tourman.2016.09.009
[16] Jeong B, Yoon J, Lee J , et al. Social Media Mining for Product Planning: A Product Opportunity Mining Approach Based on Topic Modeling and Sentiment Analysis[J]. International Journal of Information Management, 2019,48:280-290.
doi: 10.1016/j.ijinfomgt.2017.09.009
[17] Fiore A M . The Digital Consumer: Valuable Partner for Product Development and Production[J]. Clothing and Textiles Research Journal, 2008,26(2):177-190.
[18] Bengio Y, Ducharme R, Vincent P , et al. A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003,3:1137-1155.
[19] Merity S, Keskar N S, Socher R. Regularizing and Optimizing LSTM Language Models[C]// Proceedings of the 6th International Conference on Learning Representations. 2018.
[20] Melis G, Dyer C, Blunsom P. On the State of the Art of Evaluation in Neural Language Models[C]// Proceedings of the 6th International Conference on Learning Representations. 2018.
[21] Min S, Seo M J, Hajishirzi H. Question Answering Through Transfer Learning from Large Fine-grained Supervision Data[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2017: 510-517.
[22] Dai A M, Le Q V. Semi-supervised Sequence Learning[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015: 3079-3087.
[23] Howard J, Ruder S. Universal Language Model Fine-tuning for Text Classification[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2018: 328-339.
[24] Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2018: 2227-2237.
[25] Devlin J, Chang M, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 4171-4186.
[26] Li S, Zhao Z, Hu R, et al. Analogical Reasoning on Chinese Morphological and Semantic Relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2018: 138-143.
[27] Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2017: 427-431.
[28] Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2014: 1746-1751.
[29] Wang Y, Huang M, Zhu X, et al. Attention-based LSTM for Aspect-level Sentiment Classification[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2016: 606-615.
[30] Xue W, Li T. Aspect Based Sentiment Analysis with Gated Convolutional Networks[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia. Association for Computational Linguistics, 2018: 2514-2523.
[31] Meng Q, Jiang X. A Method for Rating Customer Requirements' Final Importance in QFD Based on Quantitative Kano Model[C]// Proceedings of the 8th International Conference on Service Systems and Service Management. IEEE, 2011: 1-6.
[1] Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[2] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[3] Liu Tong,Liu Chen,Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[4] Wang Yuzhu,Xie Jun,Chen Bo,Xu Xinying. Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[5] Li Feifei,Wu Fan,Wang Zhongqing. Sentiment Analysis with Reviewer Types and Generative Adversarial Network[J]. 数据分析与知识发现, 2021, 5(4): 72-79.
[6] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[7] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[8] Han Pu, Zhang Wei, Zhang Zhanpeng, Wang Yuxin, Fang Haoyu. Sentiment Analysis of Weibo Posts on Public Health Emergency with Feature Fusion and Multi-Channel[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[9] Wang Wei, Gao Ning, Xu Yuting, Wang Hongwei. Topic Evolution of Online Reviews for Crowdfunding Campaigns[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[10] Lv Huakui,Liu Zhenghao,Qian Yuxing,Hong Xudong. Relationship Between Financial News and Stock Market Fluctuations[J]. 数据分析与知识发现, 2021, 5(1): 99-111.
[11] Xu Hongxia,Yu Qianqian,Qian Li. Studying Content Interaction Data with Topic Model and Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[12] Jiang Lin,Zhang Qilin. Research on Academic Evaluation Based on Fine-Grain Citation Sentimental Quantification[J]. 数据分析与知识发现, 2020, 4(6): 129-138.
[13] Shi Lei,Wang Yi,Cheng Ying,Wei Ruibin. Review of Attention Mechanism in Natural Language Processing[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[14] Li Tiejun,Yan Duanwu,Yang Xiongfei. Recommending Microblogs Based on Emotion-Weighted Association Rules[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[15] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn