Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (12): 1-12    DOI: 10.11925/infotech.2096-3467.2022.0109
Current Issue | Archive | Adv Search |
Identifying Actionable Information from Online Reviews
Shang Lili1,2,Tang Huayun3,Wang Yanzhao3,Zuo Meiyun2()
1Post-Doctoral Research Center, China Central Depository & Clearing Co., Ltd, Beijing 100033, China
2School of Information, Renmin University of China, Beijing 100871, China
3Blockchain Lab, ChinaBond Finance and Information Technology Co., Ltd, Beijing 100044, China
Download: PDF (1207 KB)   HTML ( 28
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper explores methods automatically identifying actionable information from online reviews, aiming to help practitioners improve their follow-up work. [Methods] We defined our task as a sentence-level classification procedure, and proposed a span-based model (SAII). First, we encoded the input sentences based on BERT to generate token-level representation. Then, we enumerated all possible spans from the given sentences and generated informative representations with the help of attention mechanism. Third, we proposed a multi-channel filtering strategy to preserve spans close to the key element prototypes. Finally, we merged the refined span-level and context representations to predict actionable information. [Results] We examined the SAII model with two real-world datasets and found it yielded satisfactory results. Compared with the three best existing models, SAII’s F1 value increased by 7.91%/5.42%, 2.10%/2.73%, and 1.94%/1.46%. [Limitations] More research is needed to evaluate the effectiveness of our new model on multimodal datasets of different domains. [Conclusions] The SAII model could effectively identify actionable information from user-generated contents.

Key wordsOnline Reviews      Span Model      NLP      Actionable Information      BERT     
Received: 11 February 2022      Published: 03 February 2023
ZTFLH:  TP391  
  G203  
Fund:Fundation of Beijing Key Laboratory of Green Development Decision Making Based on Big Data(Grant No. dm202103)
Corresponding Authors: Zuo Meiyun,ORCID:0000-0002-5281-5071     E-mail: zuomy@ruc.edu.cn

Cite this article:

Shang Lili, Tang Huayun, Wang Yanzhao, Zuo Meiyun. Identifying Actionable Information from Online Reviews. Data Analysis and Knowledge Discovery, 2022, 6(12): 1-12.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0109     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I12/1

Examples for Problem Definition
The SAII Model
数据集 句子数
训练集 测试集 总计
可行动 非可行动 可行动 非可行动
Yelp 1 441 4 146 667 1 728 7 982
RateMDs 1 822 3 704 727 1 641 7 894
Statistics of Datasets
模型 Yelp数据集 RateMDs数据集
P/% R/% F1/% P/% R/% F1/%
序列模型 Elman-RNN 74.60 55.47 63.63 83.41 74.69 78.81
LSTM 74.70 56.67 64.45 84.24 74.29 78.95
BiLSTM 77.87 56.92 65.77 86.01 74.58 79.89
句子级分类模型 Coattention-LSTM 79.13 58.36 67.18 87.38 76.29 81.46
MSNN 79.66 64.98 71.58 87.32 78.32 82.58
AII 模型 CSI-BiLSTM 82.97 63.19 71.74 90.13 78.39 83.85
SAII (OURS) 84.52 65.30 73.68 89.28 81.67 85.31
Experiment Results of Compared Models
模型 Yelp数据集 RateMDs数据集
P/% R/% F1/% P/% R/% F1/%
w/o POS Encoder 82.39 65.70 73.11 88.41 81.69 84.92
w/o Span-Level Layer 80.21 62.64 70.35 86.75 80.52 83.52
w/o Attention Max Pooling 82.97 63.93 72.22 87.61 81.02 84.19
Mean Pooling 82.27 64.02 72.01 88.41 80.06 84.03
w/o Filtering Mechanism 82.58 63.50 71.80 87.02 80.56 83.67
SAII (Full Model) 84.52 65.30 73.68 89.28 81.67 85.31
Ablation Study Results
Probability Distribution of Candidate Spans
Effect of Span Length on Performances
[1] Zaman N, Goldberg D M, Abrahams A S, et al. Facebook Hospital Reviews: Automated Service Quality Detection and Relationships with Patient Satisfaction[J]. Decision Sciences, 2021, 52(6): 1403-1431.
doi: 10.1111/deci.12479
[2] 吴维芳, 高宝俊, 杨海霞, 等. 评论文本对酒店满意度的影响: 基于情感分析的方法[J]. 数据分析与知识发现, 2017, 1(3): 62-71.
[2] (Wu Weifang, Gao Baojun, Yang Haixia, et al. The Impacts of Reviews on Hotel Satisfaction: A Sentiment Analysis Method[J]. Data Analysis and Knowledge Discovery, 2017, 1(3): 62-71.)
[3] Craciun G, Moore K. Credibility of Negative Online Product Reviews: Reviewer Gender, Reputation and Emotion Effects[J]. Computers in Human Behavior, 2019, 97: 104-115.
doi: 10.1016/j.chb.2019.03.010
[4] Chen M Y, Teng C I, Chiou K W. The Helpfulness of Online Reviews: Images in Review Content and the Facial Expressions of Reviewers’ Avatars[J]. Online Information Review, 2020, 44(1): 90-113.
doi: 10.1108/OIR-08-2018-0251
[5] 祁瑞华, 简悦, 郭旭, 等. 融合特征与注意力的跨领域产品评论情感分析[J]. 数据分析与知识发现, 2020, 4(12): 85-94.
[5] (Qi Ruihua, Jian Yue, Guo Xu, et al. Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism[J]. Data Analysis and Knowledge Discovery, 2020, 4(12): 85-94.)
[6] Shang L L, Zuo M Y. What can be Improved?Identifying Actionable Items from Patient Narratives[C]// Proceedings of 2020 IEEE International Conference on Bioinformatics and Biomedicine. 2020: 1119-1123.
[7] Zhang W, Deng Z H, Hong Z Y, et al. Unhappy Patients are not Alike: Content Analysis of the Negative Comments from China’s Good Doctor Website[J]. Journal of Medical Internet Research, 2018, 20(1): e35.
doi: 10.2196/jmir.8223
[8] Grob R, Schlesinger M, Barre L R, et al. What Words Convey: The Potential for Patient Narratives to Inform Quality Improvement[J]. The Milbank Quarterly, 2019, 97(1): 176-227.
[9] Sridhar S, Srinivasan R. Social Influence Effects in Online Product Ratings[J]. Journal of Marketing, 2012, 76(5): 70-88.
doi: 10.1509/jm.10.0377
[10] Jensen M L, Averbeck J M, Zhang Z, et al. Credibility of Anonymous Online Product Reviews: A Language Expectancy Perspective[J]. Journal of Management Information Systems, 2013, 30(1): 293-324.
doi: 10.2753/MIS0742-1222300109
[11] 马超, 李纲, 陈思菁, 等. 基于多模态数据语义融合的旅游在线评论有用性识别研究[J]. 情报学报, 2020, 39(2): 199-207.
[11] (Ma Chao, Li Gang, Chen Sijing, et al. Research on Usefulness Recognition of Tourism Online Reviews Based on Multimodal Data Semantic Fusion[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(2): 199-207.)
[12] Park D H, Lee J M, Han I. The Effect of On-Line Consumer Reviews on Consumer Purchasing Intention: The Moderating Role of Involvement[J]. International Journal of Electronic Commerce, 2007, 11(4): 125-148.
[13] Salehan M, Kim D J. Predicting the Performance of Online Consumer Reviews: A Sentiment Mining Approach to Big Data Analytics[J]. Decision Support Systems, 2016, 81: 30-40.
doi: 10.1016/j.dss.2015.10.006
[14] Hu N, Koh N S, Reddy S K. Ratings Lead You to the Product, Reviews Help You Clinch It? The Mediating Role of Online Review Sentiments on Product Sales[J]. Decision Support Systems, 2014, 57: 42-53.
doi: 10.1016/j.dss.2013.07.009
[15] Su P, Mao W J, Zeng D, et al. Mining Actionable Behavioral Rules[J]. Decision Support Systems, 2012, 54(1): 142-152.
doi: 10.1016/j.dss.2012.04.013
[16] Kalanat N, Khanjari E. Extracting Actionable Knowledge from Social Networks with Node Attributes[J]. Expert Systems with Applications, 2020, 152: 113382.
doi: 10.1016/j.eswa.2020.113382
[17] Mardini M T, Raś Z W. Extraction of Actionable Knowledge to Reduce Hospital Readmissions Through Patients Personalization[J]. Information Sciences, 2019, 485: 1-17.
doi: 10.1016/j.ins.2019.02.006
[18] Kropczynski J, Grace R, Coche J, et al. Identifying Actionable Information on Social Media for Emergency Dispatch[C]// Proceedings of the 1st International Conference on Information Systems for Crisis Response and Management Asia Pacific. 2018: 428-438.
[19] Phan M H, Ogunbona P O. Modelling Context and Syntactical Features for Aspect-Based Sentiment Analysis[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3211-3220.
[20] Liao J, Wang S G, Li D Y. Identification of Fact-Implied Implicit Sentiment Based on Multi-Level Semantic Fused Representation[J]. Knowledge-Based Systems, 2019, 165: 197-207.
doi: 10.1016/j.knosys.2018.11.023
[21] Ouchi H, Shindo H, Matsumoto Y. A Span Selection Model for Semantic Role Labeling[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 1630-1642.
[22] Chen M Y, Ge T, Zhang X X, et al. Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 7162-7169.
[23] Zhou Y, Huang L T, Guo T, et al. A Span-Based Joint Model for Opinion Target Extraction and Target Sentiment Classification[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019: 5485-5491.
[24] Li F, Wang Z, Hui S C, et al. A Segment Enhanced Span-Based Model for Nested Named Entity Recognition[J]. Neurocomputing, 2021, 465: 26-37.
doi: 10.1016/j.neucom.2021.08.094
[25] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[26] Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[27] Voutilainen A. Part-of-Speech Tagging[A]//The Oxford Handbook of Computational Linguistics[M]. New York: Oxford University Press, 2004.
[28] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[29] Luan Y, Wadden D, He L H, et al. A General Framework for Information Extraction Using Dynamic Span Graphs[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 3036-3046.
[30] Viera A J, Garrett J M. Understanding Interobserver Agreement: The Kappa Statistic[J]. Family Medicine, 2005, 37(5): 360-363.
pmid: 15883903
[31] Keskar N S, Socher R. Improving Generalization Performance by Switching from Adam to SGD[OL]. arXiv Preprint, arXiv:1712.07628.
[32] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[33] Yang C, Zhang H F, Jiang B, et al. Aspect-Based Sentiment Analysis with Alternating Coattention Networks[J]. Information Processing & Management, 2019, 56(3): 463-478.
doi: 10.1016/j.ipm.2018.12.004
[34] Wu H Y, Liu Y, Shi S Y. Modularized Syntactic Neural Networks for Sentence Classification[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 2786-2792.
[35] Shah A M, Yan X B, Shah S A A, et al. Mining Patient Opinion to Evaluate the Service Quality in Healthcare: A Deep-learning Approach[J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 11(7): 2925-2942.
doi: 10.1007/s12652-019-01434-8
[1] Shi Yunmei, Yuan Bo, Zhang Le, Lv Xueqiang. IMTS: Detecting Fake Reviews with Image and Text Semantics[J]. 数据分析与知识发现, 2022, 6(8): 84-96.
[2] Wu Jiang, Liu Tao, Liu Yang. Mining Online User Profiles and Self-Presentations: Case Study of NetEase Music Community[J]. 数据分析与知识发现, 2022, 6(7): 56-69.
[3] Zheng Jie, Huang Hui, Qin Yongbin. Matching Similar Cases with Legal Knowledge Fusion[J]. 数据分析与知识发现, 2022, 6(7): 99-106.
[4] Pan Huiping, Li Baoan, Zhang Le, Lv Xueqiang. Extracting Keywords from Government Work Reports with Multi-feature Fusion[J]. 数据分析与知识发现, 2022, 6(5): 54-63.
[5] Xiao Yuejun, Li Honglian, Zhang Le, Lv Xueqiang, You Xindong. Classifying Chinese Patent Texts with Feature Fusion[J]. 数据分析与知识发现, 2022, 6(4): 49-59.
[6] Yang Lin, Huang Xiaoshuo, Wang Jiayang, Ding Lingling, Li Zixiao, Li Jiao. Identifying Subtypes of Clinical Trial Diseases with BERT-TextCNN[J]. 数据分析与知识发现, 2022, 6(4): 69-81.
[7] Guo Hangcheng, He Yanqing, Lan Tian, Wu Zhenfeng, Dong Cheng. Identifying Moves from Scientific Abstracts Based on Paragraph-BERT-CRF[J]. 数据分析与知识发现, 2022, 6(2/3): 298-307.
[8] Zhang Yunqiu, Wang Yang, Li Bocheng. Identifying Named Entities of Chinese Electronic Medical Records Based on RoBERTa-wwm Dynamic Fusion Model[J]. 数据分析与知识发现, 2022, 6(2/3): 242-250.
[9] Wang Yongsheng, Wang Hao, Yu Wei, Zhou Zeyu. Extracting Relationship Among Characters from Local Chronicles with Text Structures and Contents[J]. 数据分析与知识发现, 2022, 6(2/3): 318-328.
[10] Yan Dongmei, He Wenxin, Chen Zhi. Predicting Stock Prices Based on RoBERTa-TCN and Sentimental Characteristics[J]. 数据分析与知识发现, 2022, 6(12): 123-134.
[11] Hu Zhongyi,Zhang Shuoguo,Wu Jiang. Identifying Phishing Websites Based on URL Multi-Granularity Feature Fusion[J]. 数据分析与知识发现, 2022, 6(11): 103-110.
[12] Jia Minghua, Wang Xiuli. Quantifying Logical Relations of Financial Risks with BERT and Mutual Information[J]. 数据分析与知识发现, 2022, 6(10): 68-78.
[13] Xie Xingyu, Yu Bengong. Automatic Classification of E-commerce Comments with Multi-Feature Fusion Model[J]. 数据分析与知识发现, 2022, 6(1): 101-112.
[14] Zhang Yujie, Bai Rujiang, Xu Haiyun, Han Jing, Zhao Mengmeng. Assisted TCM Diagnosis and Treatment for Diabetes with Multi NLP Tasks[J]. 数据分析与知识发现, 2022, 6(1): 122-133.
[15] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn