[Objective] This paper proposes a classification model to identify reasons of hotel reviews from online booking platforms. [Methods] Firstly, we constructed a pretraining corpus with millions of online reviews and manually annotated the ORSC dataset for the proposed model. Then, we extracted the text features of ORSC dataset by adding the constructed corpus to ERNIE model. Finally, we used the BiLSTM model to merge all features and identify reviews with reasons. [Results] On ORSC datasets, the DERNIE model’s accuracy was 91.33% while the F1 value was 91.20%. After adding BiLSTM features, the accuracy increased to 94.57% and the F1 value became 94.62%. [Limitations] The pre-trained language models require large amount of data from the additional corpus, which might affect the computing speed and efficiency. [Conclusions] Our new model can effectively identify reason sentences from online reviews.
张治鹏, 毛煜升, 张李义. 基于领域ERNIE和BiLSTM模型的酒店评论观点原因分类研究*[J]. 数据分析与知识发现, 2022, 6(9): 65-76.
Zhang Zhipeng, Mao Yusheng, Zhang Liyi. Classifying Reasons of Hotel Reviews with Domain ERNIE and BiLSTM Model. Data Analysis and Knowledge Discovery, 2022, 6(9): 65-76.
Li G, Liu F. Sentiment Analysis Based on Clustering: A Framework in Improving Accuracy and Recognizing Neutral Opinions[J]. Applied Intelligence, 2014, 40(3): 441-452.
Jeyapriya A, Selvi C S K. Extracting Aspects and Mining Opinions in Product Reviews Using Supervised Learning Algorithm[C]// Proceeding of the 2nd International Conference on Electronics and Communication Systems. IEEE: 548-552.
Abas A R, El-Henawy I, Mohamed H, et al. Deep Learning Model for Fine-Grained Aspect-Based Opinion Mining[J]. IEEE Access, 2020, 8: 128845-128855.
( Xu Fu, Huang Xianying, Jiang Xingyu, et al. Soft Prototype Enhanced Adaptive Loss Model for Aspect Extraction[J]. Application Research of Computers, 2021, 38(11): 3310-3315.)
Sun Y, Wang S H, Li Y K, et al. ERNIE: Enhanced Representation Through Knowledge Integration[OL]. arXiv Preprint, arXiv: 1904.09223.
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
Qiu G, Liu B, Bu J J, et al. Expanding Domain Sentiment Lexicon Through Double Propagation[C]// Proceedings of the 21st International Joint Conference on Artificial Intelligence. 2009: 1199-1204.
Lakkaraju H, Bhattacharyya C, Bhattacharya I, et al. Exploiting Coherence for the Simultaneous Discovery of Latent Facets and Associated Sentiments[C]// Proceedings of the 11th SIAM International Conference on Data Mining. 2011: 498-509.
Li S, Zhou L N, Li Y J. Improving Aspect Extraction by Augmenting a Frequency-Based Method with Web-Based Similarity Measures[J]. Information Processing & Management, 2015, 51(1): 58-67.
( Zhou Qingqing, Zhang Chengzhi. Fine-Grained Aspect Extraction from Online Customer Reviews[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(5): 484-493.)
Andrzejewski D, Zhu X J, Craven M. Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors[C]// Proceedings of the 26th Annual International Conference on Machine Learning. 2009: 25-32.
Lin C H, He Y L, Everson R, et al. Weakly Supervised Joint Sentiment-Topic Detection from Text[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 1134-1145.
Luo W J, Zhuang F Z, Zhao W Z, et al. QPLSA: Utilizing Quad-Tuples for Aspect Identification and Rating[J]. Information Processing & Management, 2015, 51(1): 25-41.
Jin W, Ho H H. A Novel Lexicalized HMM-Based Learning Framework for Web Opinion Mining[C]// Proceedings of the 26th Annual International Conference on Machine Learning. 2009: 465-472.
Li X, Lam W. Deep Multi-Task Learning for Aspect Term Extraction with Memory Interaction[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 2886-2892.
Wu C H, Wu F Z, Wu S X, et al. A Hybrid Unsupervised Method for Aspect Term and Opinion Target Extraction[J]. Knowledge-Based Systems, 2018, 148: 66-73.
Yu J F, Jiang J, Xia R. Global Inference for Aspect and Opinion Terms Co-Extraction Based on Multi-Task Neural Networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(1): 168-177.
Peters M E, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[OL]. arXiv Preprint, arXiv: 1802.05365.
Chen Q, Zhuo Z, Wang W. BERT for Joint Intent Classification and Slot Filling[OL]. arXiv Preprint, arXiv: 1902.10909.
Li X Y, Zhang H, Zhou X H. Chinese Clinical Named Entity Recognition with Variant Neural Structures Based on BERT Methods[J]. Journal of Biomedical Informatics, 2020, 107: 103422.
Wang Q C, Liu P Y, Zhu Z F, et al. A Text Abstraction Summary Model Based on BERT Word Embedding and Reinforcement Learning[J]. Applied Sciences, 2019, 9(21): 4701.
Wang X L, Xu H, Sun X M, et al. Combining Fine-Tuning with a Feature-Based Approach for Aspect Extraction on Reviews[C]// Proceedings of the 2020 AAAI Conference on Artificial Intelligence, 2020: 13951-13952.
Kim Y. Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882
Taylor W L. “Cloze Procedure”: A New Tool for Measuring Readability[J]. Journalism Quarterly, 1953, 30(4): 415-433.
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.