Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (9): 65-76     https://doi.org/10.11925/infotech.2096-3467.2021.1303
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于领域ERNIE和BiLSTM模型的酒店评论观点原因分类研究*
张治鹏,毛煜升,张李义()
武汉大学信息管理学院 武汉 430072
Classifying Reasons of Hotel Reviews with Domain ERNIE and BiLSTM Model
Zhang Zhipeng,Mao Yusheng,Zhang Liyi()
School of Information Management, Wuhan University, Wuhan 430072, China
全文: PDF (1946 KB)   HTML ( 24
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 挖掘在线预订平台评论中的观点原因,提出一个观点原因句分类模型(DERNIE-BiLSTM)。【方法】 构建一个数据量百万级别的酒店领域评论语料库并人工标注一个数据集ORSC,将语料库额外加入ERNIE自有的预训练集中并通过预训练提取ORSC数据集的文本特征,利用BiLSTM模型融合特征并识别包含观点原因的评论。【结果】 在ORSC数据集上,DERNIE短分类准确率为0.913 3, F1值为0.912 0;经过BiLSTM融合特征后的准确率提升到0.945 7,F1值提升到0.946 2。【局限】 预训练语言模型需要大量的训练语料,对计算速度和效率会产生一定影响。【结论】 DERNIE-BiLSTM预训练模型的特征提取和融合方法,能更精准地挖掘评论中的观点原因句。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张治鹏
毛煜升
张李义
关键词 在线评论观点原因句分类ERNIE模型BiLSTM模型    
Abstract

[Objective] This paper proposes a classification model to identify reasons of hotel reviews from online booking platforms. [Methods] Firstly, we constructed a pretraining corpus with millions of online reviews and manually annotated the ORSC dataset for the proposed model. Then, we extracted the text features of ORSC dataset by adding the constructed corpus to ERNIE model. Finally, we used the BiLSTM model to merge all features and identify reviews with reasons. [Results] On ORSC datasets, the DERNIE model’s accuracy was 91.33% while the F1 value was 91.20%. After adding BiLSTM features, the accuracy increased to 94.57% and the F1 value became 94.62%. [Limitations] The pre-trained language models require large amount of data from the additional corpus, which might affect the computing speed and efficiency. [Conclusions] Our new model can effectively identify reason sentences from online reviews.

Key wordsOnline Review    Opinion Reason Sentence Classification    ERNIE Model    BiLSTM Model
收稿日期: 2021-11-16      出版日期: 2022-10-26
ZTFLH:  TP391  
  G250  
基金资助:*国家自然科学基金项目(71874126)
通讯作者: 张李义,ORCID: 0000-0001-8634-9227     E-mail: lyzhang@whu.edu.cn
引用本文:   
张治鹏, 毛煜升, 张李义. 基于领域ERNIE和BiLSTM模型的酒店评论观点原因分类研究*[J]. 数据分析与知识发现, 2022, 6(9): 65-76.
Zhang Zhipeng, Mao Yusheng, Zhang Liyi. Classifying Reasons of Hotel Reviews with Domain ERNIE and BiLSTM Model. Data Analysis and Knowledge Discovery, 2022, 6(9): 65-76.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.1303      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I9/65
Fig.1  观点原因识别模型
Fig.2  BERT与ERNIE不同的掩码策略
Fig.3  下一句预测模型
Fig.4  DERNIE模型结构
Fig.5  BiLSTM模型结构
Fig.6  DERNIE-BiLSTM模型结构
类别 评论
观点
原因句
1.服务人员未经同意擅自进入房间。
2.房间实在太小,二个人都无法并排走
3.无窗,面积很小,非常潮湿闷气,空调的水都是用大
矿泉水瓶接的厕所无完整隔断,导致房内更加潮湿。
但总体来说,住了一夜没有耽误行程,已经很ok了。
非观点
原因句
1.综合条件太差
2.帮朋友订的,不知道怎么样
3.楼下是洗浴,楼上不知道是什么,两三点钟的时候好多脚步声,上楼下楼的,严重影响休息。体验很差!
Table 1  ORSC数据集示例
超参数 TextCNN DERNIE BERT-BiLSTM ERNIE-BiLSTM DERNIE-BiLSTM
character embedding dimensions 100 768 768 768 768
hidden dimensions 100 768 768 768 768
max sequence length 64 64 64 64 64
batch_size 32 16 32 32 32
learning rate 1e-3 3e-5 5e-5 3e-5 5e-5
epochs 6 11 7 13 20
dropout 0.5 0.1 0.1 0.1 0.1
Table 2  ORSC实验超参数设置
Fig.7  预训练过程的损失 L变化
例子 样本 BERT预测 ERNIE预测 DERNIE预测
1 很好,主动给我们介绍附近的景点。 服台人务 朋友关系 服务态度
2 卫生差, 有小虫子咬得却都是疱 虽然 虽使 床上
3 极差,住的人三六九等,半夜被吵醒多次 睡眠 环境 隔音
4 硬件设施,和其他酒店差距有点大! 不般 方面 一般
5 位置就是离 近,卫生很差 酒店很 学校很 火车站
Table 3  完形填空实验结果
方法 Accuracy (%) Precision (%) Recall (%) F1-score (%)
TextCNN 90.81 90.64 91.07 90.86
DERNIE 91.33 92.91 89.55 91.20
BERT-BiLSTM 92.57 92.27 92.97 92.62
ERNIE-BiLSTM 94.10 93.86 94.40 94.13
DERNIE-BiLSTM 94.57 94.00 95.25 94.62
Table 4  ORSC实验结果
[1] Li G, Liu F. Sentiment Analysis Based on Clustering: A Framework in Improving Accuracy and Recognizing Neutral Opinions[J]. Applied Intelligence, 2014, 40(3): 441-452.
doi: 10.1007/s10489-013-0463-3
[2] Jeyapriya A, Selvi C S K. Extracting Aspects and Mining Opinions in Product Reviews Using Supervised Learning Algorithm[C]// Proceeding of the 2nd International Conference on Electronics and Communication Systems. IEEE: 548-552.
[3] Abas A R, El-Henawy I, Mohamed H, et al. Deep Learning Model for Fine-Grained Aspect-Based Opinion Mining[J]. IEEE Access, 2020, 8: 128845-128855.
doi: 10.1109/ACCESS.2020.3008824
[4] 徐福, 黄贤英, 蒋兴渝, 等. 用于方面提取的软原型增强自适应损失模型[J]. 计算机应用研究, 2021, 38(11): 3310-3315.
[4] ( Xu Fu, Huang Xianying, Jiang Xingyu, et al. Soft Prototype Enhanced Adaptive Loss Model for Aspect Extraction[J]. Application Research of Computers, 2021, 38(11): 3310-3315.)
[5] Sun Y, Wang S H, Li Y K, et al. ERNIE: Enhanced Representation Through Knowledge Integration[OL]. arXiv Preprint, arXiv: 1904.09223.
[6] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[7] Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[8] Qiu G, Liu B, Bu J J, et al. Expanding Domain Sentiment Lexicon Through Double Propagation[C]// Proceedings of the 21st International Joint Conference on Artificial Intelligence. 2009: 1199-1204.
[9] Lakkaraju H, Bhattacharyya C, Bhattacharya I, et al. Exploiting Coherence for the Simultaneous Discovery of Latent Facets and Associated Sentiments[C]// Proceedings of the 11th SIAM International Conference on Data Mining. 2011: 498-509.
[10] Li S, Zhou L N, Li Y J. Improving Aspect Extraction by Augmenting a Frequency-Based Method with Web-Based Similarity Measures[J]. Information Processing & Management, 2015, 51(1): 58-67.
doi: 10.1016/j.ipm.2014.08.005
[11] 周清清, 章成志. 在线用户评论细粒度属性抽取[J]. 情报学报, 2017, 36(5): 484-493.
[11] ( Zhou Qingqing, Zhang Chengzhi. Fine-Grained Aspect Extraction from Online Customer Reviews[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(5): 484-493.)
[12] Andrzejewski D, Zhu X J, Craven M. Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors[C]// Proceedings of the 26th Annual International Conference on Machine Learning. 2009: 25-32.
[13] Lin C H, He Y L, Everson R, et al. Weakly Supervised Joint Sentiment-Topic Detection from Text[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 1134-1145.
doi: 10.1109/TKDE.2011.48
[14] Luo W J, Zhuang F Z, Zhao W Z, et al. QPLSA: Utilizing Quad-Tuples for Aspect Identification and Rating[J]. Information Processing & Management, 2015, 51(1): 25-41.
doi: 10.1016/j.ipm.2014.08.004
[15] Jin W, Ho H H. A Novel Lexicalized HMM-Based Learning Framework for Web Opinion Mining[C]// Proceedings of the 26th Annual International Conference on Machine Learning. 2009: 465-472.
[16] Li X, Lam W. Deep Multi-Task Learning for Aspect Term Extraction with Memory Interaction[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 2886-2892.
[17] Wu C H, Wu F Z, Wu S X, et al. A Hybrid Unsupervised Method for Aspect Term and Opinion Target Extraction[J]. Knowledge-Based Systems, 2018, 148: 66-73.
doi: 10.1016/j.knosys.2018.01.019
[18] Yu J F, Jiang J, Xia R. Global Inference for Aspect and Opinion Terms Co-Extraction Based on Multi-Task Neural Networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(1): 168-177.
doi: 10.1109/TASLP.2018.2875170
[19] Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[OL]. arXiv Preprint, arXiv: 1802.05365.
[20] Chen Q, Zhuo Z, Wang W. BERT for Joint Intent Classification and Slot Filling[OL]. arXiv Preprint, arXiv: 1902.10909.
[21] Li X Y, Zhang H, Zhou X H. Chinese Clinical Named Entity Recognition with Variant Neural Structures Based on BERT Methods[J]. Journal of Biomedical Informatics, 2020, 107: 103422.
doi: 10.1016/j.jbi.2020.103422
[22] Wang Q C, Liu P Y, Zhu Z F, et al. A Text Abstraction Summary Model Based on BERT Word Embedding and Reinforcement Learning[J]. Applied Sciences, 2019, 9(21): 4701.
doi: 10.3390/app9214701
[23] Wang X L, Xu H, Sun X M, et al. Combining Fine-Tuning with a Feature-Based Approach for Aspect Extraction on Reviews[C]// Proceedings of the 2020 AAAI Conference on Artificial Intelligence, 2020: 13951-13952.
[24] Kim Y. Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882
[25] Taylor W L. “Cloze Procedure”: A New Tool for Measuring Readability[J]. Journalism Quarterly, 1953, 30(4): 415-433.
doi: 10.1177/107769905303000401
[26] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[27] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
pmid: 9377276
[28] Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
[29] Micikevicius P, Narang S, Alben J, et al. Mixed Precision Training[OL]. arXiv Preprint, arXiv: 1710.03740.
[1] 王伟, 高宁, 徐玉婷, 王洪伟. 基于LDA的众筹项目在线评论主题动态演化分析*[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[2] 沈卓,李艳. 基于PreLM-FT细粒度情感分析的餐饮业用户评论挖掘[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
[3] 李贺,刘嘉宇,沈旺,刘锐,金帅岐. 基于模糊认知图的在线健康社区知识推荐研究*[J]. 数据分析与知识发现, 2020, 4(12): 55-67.
[4] 余本功, 张培行, 许庆堂. 基于F-BiGRU情感分析的产品选择方法*[J]. 数据分析与知识发现, 2018, 2(9): 22-30.
[5] 吴江, 刘弯弯. 什么样的评论更容易获得有用性投票*——以亚马逊网站研究为例[J]. 数据分析与知识发现, 2017, 1(9): 16-27.
[6] 李慧, 胡云凤. 基于动态情感主题模型的在线评论分析*[J]. 数据分析与知识发现, 2017, 1(9): 74-82.
[7] 张艳丰, 李贺, 彭丽徽, 侯力铁. 基于情感语义特征抽取的在线评论有用性分类算法与应用[J]. 数据分析与知识发现, 2017, 1(12): 74-83.
[8] 杨海霞,吴维芳,孙含林. 基于STM分析旅行者对不同档次酒店的偏好差异[J]. 现代图书情报技术, 2016, 32(9): 51-57.
[9] 张艳丰,李贺,彭丽徽. 基于模糊情感计算的商品在线评论用户品牌转换意向研究*[J]. 现代图书情报技术, 2016, 32(5): 64-71.
[10] 高松,王洪伟,冯罡,王伟. 面向在线评论的比较观点挖掘研究综述*[J]. 现代图书情报技术, 2016, 32(10): 1-12.
[11] 孙霄凌, 赵宇翔, 朱庆华. 在线商品评论系统功能需求的Kano模型分析——以我国主要购物网站为例[J]. 现代图书情报技术, 2013, (6): 76-84.
[12] 李志宇. 在线商品评论效用排序模型研究[J]. 现代图书情报技术, 2013, (4): 62-68.
[13] 张红斌, 李广丽. 商品在线评价的情感倾向性分析研究[J]. 现代图书情报技术, 2012, (10): 61-66.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn