在线评论可行动信息识别研究<sup>*</sup>

doi:10.11925/infotech.2096-3467.2022.0109

数据分析与知识发现

2022, Vol. 6

Issue (12): 1-12 https://doi.org/10.11925/infotech.2096-3467.2022.0109

研究论文

本期目录 | 过刊浏览 | 高级检索

在线评论可行动信息识别研究^*

商丽丽^1,²,唐华云³,王延昭³,左美云²(

)

¹中央国债登记结算有限责任公司博士后科研工作站北京 100033
²中国人民大学信息学院北京 100871
³中债金科信息技术有限公司区块链实验室北京 100044

Identifying Actionable Information from Online Reviews

Shang Lili^1,²,Tang Huayun³,Wang Yanzhao³,Zuo Meiyun²(

)

¹Post-Doctoral Research Center, China Central Depository & Clearing Co., Ltd, Beijing 100033, China
²School of Information, Renmin University of China, Beijing 100871, China
³Blockchain Lab, ChinaBond Finance and Information Technology Co., Ltd, Beijing 100044, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (1207 KB) HTML ( 28 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】以评论文本为研究对象，研究可行动信息识别方法，为实践者发挥自身优势及弥补不足提供行动参考。【方法】将目标任务定义为句子级分类问题，提出一种基于文段的可行动信息识别模型SAII。基于BERT预训练模型对输入句子进行编码，建立词级别的上下文表征；枚举句子中不同范围的文段，引入文段注意力机制生成信息量丰富的文段级表征；为缓解噪声问题，提出多通道文段过滤机制，最大限度地保留接近关键元素原型的文段；融合提纯后的文段表示和上下文表示，自动识别可行动信息。【结果】在两个真实数据集上的实验结果表明，所提模型的效果最佳。与三类基线模型中的最优模型相比，SAII模型在Yelp数据集和RateMDs数据集上的F1指标分别提高7.91个百分点和5.42个百分点；2.10个百分点和2.73个百分点；1.94个百分点和1.46个百分点。【局限】仍需在多领域和多模态数据集上广泛验证模型的有效性。【结论】本文模型具备词级和文段级表征能力，有效提高了识别准确率，推动了用户生成内容的价值实现。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	商丽丽
	唐华云
	王延昭
	左美云

关键词 ：文本评论, 文段模型, 自然语言处理, 可行动信息, BERT

Abstract：

[Objective] This paper explores methods automatically identifying actionable information from online reviews, aiming to help practitioners improve their follow-up work. [Methods] We defined our task as a sentence-level classification procedure, and proposed a span-based model (SAII). First, we encoded the input sentences based on BERT to generate token-level representation. Then, we enumerated all possible spans from the given sentences and generated informative representations with the help of attention mechanism. Third, we proposed a multi-channel filtering strategy to preserve spans close to the key element prototypes. Finally, we merged the refined span-level and context representations to predict actionable information. [Results] We examined the SAII model with two real-world datasets and found it yielded satisfactory results. Compared with the three best existing models, SAII’s F1 value increased by 7.91%/5.42%, 2.10%/2.73%, and 1.94%/1.46%. [Limitations] More research is needed to evaluate the effectiveness of our new model on multimodal datasets of different domains. [Conclusions] The SAII model could effectively identify actionable information from user-generated contents.

Key words： Online Reviews Span Model NLP Actionable Information BERT

收稿日期: 2022-02-11 出版日期: 2023-02-03

ZTFLH:	TP391
	G203

基金资助:*绿色发展大数据决策北京市重点实验室基金项目的研究成果之一(dm202103)

通讯作者: 左美云，ORCID：0000-0002-5281-5071 E-mail: zuomy@ruc.edu.cn

引用本文:

商丽丽, 唐华云, 王延昭, 左美云. 在线评论可行动信息识别研究^*[J]. 数据分析与知识发现, 2022, 6(12): 1-12.
Shang Lili, Tang Huayun, Wang Yanzhao, Zuo Meiyun. Identifying Actionable Information from Online Reviews. Data Analysis and Knowledge Discovery, 2022, 6(12): 1-12.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0109 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I12/1

Fig.1 问题定义示例

Fig.2 SAII模型图

Table 1 数据集统计特征

Table 2 对比实验结果

Table 3 消融实验结果

Fig.3 候选文段概率分布

Fig.4 最长跨度对性能的影响

[1]	Zaman N, Goldberg D M, Abrahams A S, et al. Facebook Hospital Reviews: Automated Service Quality Detection and Relationships with Patient Satisfaction[J]. Decision Sciences, 2021, 52(6): 1403-1431. doi: 10.1111/deci.12479
[2]	吴维芳, 高宝俊, 杨海霞, 等. 评论文本对酒店满意度的影响: 基于情感分析的方法[J]. 数据分析与知识发现, 2017, 1(3): 62-71.
[2]	(Wu Weifang, Gao Baojun, Yang Haixia, et al. The Impacts of Reviews on Hotel Satisfaction: A Sentiment Analysis Method[J]. Data Analysis and Knowledge Discovery, 2017, 1(3): 62-71.)
[3]	Craciun G, Moore K. Credibility of Negative Online Product Reviews: Reviewer Gender, Reputation and Emotion Effects[J]. Computers in Human Behavior, 2019, 97: 104-115. doi: 10.1016/j.chb.2019.03.010
[4]	Chen M Y, Teng C I, Chiou K W. The Helpfulness of Online Reviews: Images in Review Content and the Facial Expressions of Reviewers’ Avatars[J]. Online Information Review, 2020, 44(1): 90-113. doi: 10.1108/OIR-08-2018-0251
[5]	祁瑞华, 简悦, 郭旭, 等. 融合特征与注意力的跨领域产品评论情感分析[J]. 数据分析与知识发现, 2020, 4(12): 85-94.
[5]	(Qi Ruihua, Jian Yue, Guo Xu, et al. Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism[J]. Data Analysis and Knowledge Discovery, 2020, 4(12): 85-94.)
[6]	Shang L L, Zuo M Y. What can be Improved?Identifying Actionable Items from Patient Narratives[C]// Proceedings of 2020 IEEE International Conference on Bioinformatics and Biomedicine. 2020: 1119-1123.
[7]	Zhang W, Deng Z H, Hong Z Y, et al. Unhappy Patients are not Alike: Content Analysis of the Negative Comments from China’s Good Doctor Website[J]. Journal of Medical Internet Research, 2018, 20(1): e35. doi: 10.2196/jmir.8223
[8]	Grob R, Schlesinger M, Barre L R, et al. What Words Convey: The Potential for Patient Narratives to Inform Quality Improvement[J]. The Milbank Quarterly, 2019, 97(1): 176-227.
[9]	Sridhar S, Srinivasan R. Social Influence Effects in Online Product Ratings[J]. Journal of Marketing, 2012, 76(5): 70-88. doi: 10.1509/jm.10.0377
[10]	Jensen M L, Averbeck J M, Zhang Z, et al. Credibility of Anonymous Online Product Reviews: A Language Expectancy Perspective[J]. Journal of Management Information Systems, 2013, 30(1): 293-324. doi: 10.2753/MIS0742-1222300109
[11]	马超, 李纲, 陈思菁, 等. 基于多模态数据语义融合的旅游在线评论有用性识别研究[J]. 情报学报, 2020, 39(2): 199-207.
[11]	(Ma Chao, Li Gang, Chen Sijing, et al. Research on Usefulness Recognition of Tourism Online Reviews Based on Multimodal Data Semantic Fusion[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(2): 199-207.)
[12]	Park D H, Lee J M, Han I. The Effect of On-Line Consumer Reviews on Consumer Purchasing Intention: The Moderating Role of Involvement[J]. International Journal of Electronic Commerce, 2007, 11(4): 125-148.
[13]	Salehan M, Kim D J. Predicting the Performance of Online Consumer Reviews: A Sentiment Mining Approach to Big Data Analytics[J]. Decision Support Systems, 2016, 81: 30-40. doi: 10.1016/j.dss.2015.10.006
[14]	Hu N, Koh N S, Reddy S K. Ratings Lead You to the Product, Reviews Help You Clinch It? The Mediating Role of Online Review Sentiments on Product Sales[J]. Decision Support Systems, 2014, 57: 42-53. doi: 10.1016/j.dss.2013.07.009
[15]	Su P, Mao W J, Zeng D, et al. Mining Actionable Behavioral Rules[J]. Decision Support Systems, 2012, 54(1): 142-152. doi: 10.1016/j.dss.2012.04.013
[16]	Kalanat N, Khanjari E. Extracting Actionable Knowledge from Social Networks with Node Attributes[J]. Expert Systems with Applications, 2020, 152: 113382. doi: 10.1016/j.eswa.2020.113382
[17]	Mardini M T, Raś Z W. Extraction of Actionable Knowledge to Reduce Hospital Readmissions Through Patients Personalization[J]. Information Sciences, 2019, 485: 1-17. doi: 10.1016/j.ins.2019.02.006
[18]	Kropczynski J, Grace R, Coche J, et al. Identifying Actionable Information on Social Media for Emergency Dispatch[C]// Proceedings of the 1st International Conference on Information Systems for Crisis Response and Management Asia Pacific. 2018: 428-438.
[19]	Phan M H, Ogunbona P O. Modelling Context and Syntactical Features for Aspect-Based Sentiment Analysis[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3211-3220.
[20]	Liao J, Wang S G, Li D Y. Identification of Fact-Implied Implicit Sentiment Based on Multi-Level Semantic Fused Representation[J]. Knowledge-Based Systems, 2019, 165: 197-207. doi: 10.1016/j.knosys.2018.11.023
[21]	Ouchi H, Shindo H, Matsumoto Y. A Span Selection Model for Semantic Role Labeling[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 1630-1642.
[22]	Chen M Y, Ge T, Zhang X X, et al. Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 7162-7169.
[23]	Zhou Y, Huang L T, Guo T, et al. A Span-Based Joint Model for Opinion Target Extraction and Target Sentiment Classification[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019: 5485-5491.
[24]	Li F, Wang Z, Hui S C, et al. A Segment Enhanced Span-Based Model for Nested Named Entity Recognition[J]. Neurocomputing, 2021, 465: 26-37. doi: 10.1016/j.neucom.2021.08.094
[25]	Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[26]	Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[27]	Voutilainen A. Part-of-Speech Tagging[A]//The Oxford Handbook of Computational Linguistics[M]. New York: Oxford University Press, 2004.
[28]	Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[29]	Luan Y, Wadden D, He L H, et al. A General Framework for Information Extraction Using Dynamic Span Graphs[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 3036-3046.
[30]	Viera A J, Garrett J M. Understanding Interobserver Agreement: The Kappa Statistic[J]. Family Medicine, 2005, 37(5): 360-363. pmid: 15883903
[31]	Keskar N S, Socher R. Improving Generalization Performance by Switching from Adam to SGD[OL]. arXiv Preprint, arXiv:1712.07628.
[32]	Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[33]	Yang C, Zhang H F, Jiang B, et al. Aspect-Based Sentiment Analysis with Alternating Coattention Networks[J]. Information Processing & Management, 2019, 56(3): 463-478. doi: 10.1016/j.ipm.2018.12.004
[34]	Wu H Y, Liu Y, Shi S Y. Modularized Syntactic Neural Networks for Sentence Classification[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 2786-2792.
[35]	Shah A M, Yan X B, Shah S A A, et al. Mining Patient Opinion to Evaluate the Service Quality in Healthcare: A Deep-learning Approach[J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 11(7): 2925-2942. doi: 10.1007/s12652-019-01434-8

[1]	施运梅, 袁博, 张乐, 吕学强. IMTS：融合图像与文本语义的虚假评论检测方法*[J]. 数据分析与知识发现, 2022, 6(8): 84-96.
[2]	郑洁, 黄辉, 秦永彬. 一种融合法律知识的相似案例匹配模型^*[J]. 数据分析与知识发现, 2022, 6(7): 99-106.
[3]	吴江, 刘涛, 刘洋. 在线社区用户画像及自我呈现主题挖掘——以网易云音乐社区为例^*[J]. 数据分析与知识发现, 2022, 6(7): 56-69.
[4]	潘慧萍, 李宝安, 张乐, 吕学强. 基于多特征融合的政府工作报告关键词提取研究*[J]. 数据分析与知识发现, 2022, 6(5): 54-63.
[5]	肖悦珺, 李红莲, 张乐, 吕学强, 游新冬. 特征融合的中文专利文本分类方法研究^*[J]. 数据分析与知识发现, 2022, 6(4): 49-59.
[6]	杨林, 黄晓硕, 王嘉阳, 丁玲玲, 李子孝, 李姣. 基于BERT-TextCNN的临床试验疾病亚型识别研究^*[J]. 数据分析与知识发现, 2022, 6(4): 69-81.
[7]	郭航程, 何彦青, 兰天, 吴振峰, 董诚. 基于Paragraph-BERT-CRF的科技论文摘要语步功能信息识别方法研究^*[J]. 数据分析与知识发现, 2022, 6(2/3): 298-307.
[8]	王永生, 王昊, 虞为, 周泽聿. 融合结构和内容的方志文本人物关系抽取方法^*[J]. 数据分析与知识发现, 2022, 6(2/3): 318-328.
[9]	张云秋, 汪洋, 李博诚. 基于RoBERTa-wwm动态融合模型的中文电子病历命名实体识别^*[J]. 数据分析与知识发现, 2022, 6(2/3): 242-250.
[10]	严冬梅, 何雯馨, 陈智. 融合情感特征的基于RoBERTa-TCN的股价预测研究[J]. 数据分析与知识发现, 2022, 6(12): 123-134.
[11]	胡忠义,张硕果,吴江. 基于URL多粒度特征融合的钓鱼网站识别^*[J]. 数据分析与知识发现, 2022, 6(11): 103-110.
[12]	贾明华, 王秀利. 基于BERT和互信息的金融风险逻辑关系量化方法[J]. 数据分析与知识发现, 2022, 6(10): 68-78.
[13]	谢星雨, 余本功. 基于MFFMB的电商评论文本分类研究*[J]. 数据分析与知识发现, 2022, 6(1): 101-112.
[14]	张玉洁, 白如江, 许海云, 韩靖, 赵梦梦. 融合多自然语言处理任务的中医辅助诊疗方案研究——以糖尿病为例*[J]. 数据分析与知识发现, 2022, 6(1): 122-133.
[15]	陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.

Viewed

Full text

Abstract

Cited

Shared

Discussed