Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (12): 1-12     https://doi.org/10.11925/infotech.2096-3467.2022.0109
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
在线评论可行动信息识别研究*
商丽丽1,2,唐华云3,王延昭3,左美云2()
1中央国债登记结算有限责任公司博士后科研工作站 北京 100033
2中国人民大学信息学院 北京 100871
3中债金科信息技术有限公司区块链实验室 北京 100044
Identifying Actionable Information from Online Reviews
Shang Lili1,2,Tang Huayun3,Wang Yanzhao3,Zuo Meiyun2()
1Post-Doctoral Research Center, China Central Depository & Clearing Co., Ltd, Beijing 100033, China
2School of Information, Renmin University of China, Beijing 100871, China
3Blockchain Lab, ChinaBond Finance and Information Technology Co., Ltd, Beijing 100044, China
全文: PDF (1207 KB)   HTML ( 28
输出: BibTeX | EndNote (RIS)      
摘要 

目的】 以评论文本为研究对象,研究可行动信息识别方法,为实践者发挥自身优势及弥补不足提供行动参考。【方法】 将目标任务定义为句子级分类问题,提出一种基于文段的可行动信息识别模型SAII。基于BERT预训练模型对输入句子进行编码,建立词级别的上下文表征;枚举句子中不同范围的文段,引入文段注意力机制生成信息量丰富的文段级表征;为缓解噪声问题,提出多通道文段过滤机制,最大限度地保留接近关键元素原型的文段;融合提纯后的文段表示和上下文表示,自动识别可行动信息。【结果】 在两个真实数据集上的实验结果表明,所提模型的效果最佳。与三类基线模型中的最优模型相比,SAII模型在Yelp数据集和RateMDs数据集上的F1指标分别提高7.91个百分点和5.42个百分点;2.10个百分点和2.73个百分点;1.94个百分点和1.46个百分点。【局限】 仍需在多领域和多模态数据集上广泛验证模型的有效性。【结论】 本文模型具备词级和文段级表征能力,有效提高了识别准确率,推动了用户生成内容的价值实现。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
商丽丽
唐华云
王延昭
左美云
关键词 文本评论文段模型自然语言处理可行动信息BERT    
Abstract

[Objective] This paper explores methods automatically identifying actionable information from online reviews, aiming to help practitioners improve their follow-up work. [Methods] We defined our task as a sentence-level classification procedure, and proposed a span-based model (SAII). First, we encoded the input sentences based on BERT to generate token-level representation. Then, we enumerated all possible spans from the given sentences and generated informative representations with the help of attention mechanism. Third, we proposed a multi-channel filtering strategy to preserve spans close to the key element prototypes. Finally, we merged the refined span-level and context representations to predict actionable information. [Results] We examined the SAII model with two real-world datasets and found it yielded satisfactory results. Compared with the three best existing models, SAII’s F1 value increased by 7.91%/5.42%, 2.10%/2.73%, and 1.94%/1.46%. [Limitations] More research is needed to evaluate the effectiveness of our new model on multimodal datasets of different domains. [Conclusions] The SAII model could effectively identify actionable information from user-generated contents.

Key wordsOnline Reviews    Span Model    NLP    Actionable Information    BERT
收稿日期: 2022-02-11      出版日期: 2023-02-03
ZTFLH:  TP391  
  G203  
基金资助:*绿色发展大数据决策北京市重点实验室基金项目的研究成果之一(dm202103)
通讯作者: 左美云,ORCID:0000-0002-5281-5071     E-mail: zuomy@ruc.edu.cn
引用本文:   
商丽丽, 唐华云, 王延昭, 左美云. 在线评论可行动信息识别研究*[J]. 数据分析与知识发现, 2022, 6(12): 1-12.
Shang Lili, Tang Huayun, Wang Yanzhao, Zuo Meiyun. Identifying Actionable Information from Online Reviews. Data Analysis and Knowledge Discovery, 2022, 6(12): 1-12.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0109      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I12/1
Fig.1  问题定义示例
Fig.2  SAII模型图
数据集 句子数
训练集 测试集 总计
可行动 非可行动 可行动 非可行动
Yelp 1 441 4 146 667 1 728 7 982
RateMDs 1 822 3 704 727 1 641 7 894
Table 1  数据集统计特征
模型 Yelp数据集 RateMDs数据集
P/% R/% F1/% P/% R/% F1/%
序列模型 Elman-RNN 74.60 55.47 63.63 83.41 74.69 78.81
LSTM 74.70 56.67 64.45 84.24 74.29 78.95
BiLSTM 77.87 56.92 65.77 86.01 74.58 79.89
句子级分类模型 Coattention-LSTM 79.13 58.36 67.18 87.38 76.29 81.46
MSNN 79.66 64.98 71.58 87.32 78.32 82.58
AII 模型 CSI-BiLSTM 82.97 63.19 71.74 90.13 78.39 83.85
SAII (OURS) 84.52 65.30 73.68 89.28 81.67 85.31
Table 2  对比实验结果
模型 Yelp数据集 RateMDs数据集
P/% R/% F1/% P/% R/% F1/%
w/o POS Encoder 82.39 65.70 73.11 88.41 81.69 84.92
w/o Span-Level Layer 80.21 62.64 70.35 86.75 80.52 83.52
w/o Attention Max Pooling 82.97 63.93 72.22 87.61 81.02 84.19
Mean Pooling 82.27 64.02 72.01 88.41 80.06 84.03
w/o Filtering Mechanism 82.58 63.50 71.80 87.02 80.56 83.67
SAII (Full Model) 84.52 65.30 73.68 89.28 81.67 85.31
Table 3  消融实验结果
Fig.3  候选文段概率分布
Fig.4  最长跨度对性能的影响
[1] Zaman N, Goldberg D M, Abrahams A S, et al. Facebook Hospital Reviews: Automated Service Quality Detection and Relationships with Patient Satisfaction[J]. Decision Sciences, 2021, 52(6): 1403-1431.
doi: 10.1111/deci.12479
[2] 吴维芳, 高宝俊, 杨海霞, 等. 评论文本对酒店满意度的影响: 基于情感分析的方法[J]. 数据分析与知识发现, 2017, 1(3): 62-71.
[2] (Wu Weifang, Gao Baojun, Yang Haixia, et al. The Impacts of Reviews on Hotel Satisfaction: A Sentiment Analysis Method[J]. Data Analysis and Knowledge Discovery, 2017, 1(3): 62-71.)
[3] Craciun G, Moore K. Credibility of Negative Online Product Reviews: Reviewer Gender, Reputation and Emotion Effects[J]. Computers in Human Behavior, 2019, 97: 104-115.
doi: 10.1016/j.chb.2019.03.010
[4] Chen M Y, Teng C I, Chiou K W. The Helpfulness of Online Reviews: Images in Review Content and the Facial Expressions of Reviewers’ Avatars[J]. Online Information Review, 2020, 44(1): 90-113.
doi: 10.1108/OIR-08-2018-0251
[5] 祁瑞华, 简悦, 郭旭, 等. 融合特征与注意力的跨领域产品评论情感分析[J]. 数据分析与知识发现, 2020, 4(12): 85-94.
[5] (Qi Ruihua, Jian Yue, Guo Xu, et al. Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism[J]. Data Analysis and Knowledge Discovery, 2020, 4(12): 85-94.)
[6] Shang L L, Zuo M Y. What can be Improved?Identifying Actionable Items from Patient Narratives[C]// Proceedings of 2020 IEEE International Conference on Bioinformatics and Biomedicine. 2020: 1119-1123.
[7] Zhang W, Deng Z H, Hong Z Y, et al. Unhappy Patients are not Alike: Content Analysis of the Negative Comments from China’s Good Doctor Website[J]. Journal of Medical Internet Research, 2018, 20(1): e35.
doi: 10.2196/jmir.8223
[8] Grob R, Schlesinger M, Barre L R, et al. What Words Convey: The Potential for Patient Narratives to Inform Quality Improvement[J]. The Milbank Quarterly, 2019, 97(1): 176-227.
[9] Sridhar S, Srinivasan R. Social Influence Effects in Online Product Ratings[J]. Journal of Marketing, 2012, 76(5): 70-88.
doi: 10.1509/jm.10.0377
[10] Jensen M L, Averbeck J M, Zhang Z, et al. Credibility of Anonymous Online Product Reviews: A Language Expectancy Perspective[J]. Journal of Management Information Systems, 2013, 30(1): 293-324.
doi: 10.2753/MIS0742-1222300109
[11] 马超, 李纲, 陈思菁, 等. 基于多模态数据语义融合的旅游在线评论有用性识别研究[J]. 情报学报, 2020, 39(2): 199-207.
[11] (Ma Chao, Li Gang, Chen Sijing, et al. Research on Usefulness Recognition of Tourism Online Reviews Based on Multimodal Data Semantic Fusion[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(2): 199-207.)
[12] Park D H, Lee J M, Han I. The Effect of On-Line Consumer Reviews on Consumer Purchasing Intention: The Moderating Role of Involvement[J]. International Journal of Electronic Commerce, 2007, 11(4): 125-148.
[13] Salehan M, Kim D J. Predicting the Performance of Online Consumer Reviews: A Sentiment Mining Approach to Big Data Analytics[J]. Decision Support Systems, 2016, 81: 30-40.
doi: 10.1016/j.dss.2015.10.006
[14] Hu N, Koh N S, Reddy S K. Ratings Lead You to the Product, Reviews Help You Clinch It? The Mediating Role of Online Review Sentiments on Product Sales[J]. Decision Support Systems, 2014, 57: 42-53.
doi: 10.1016/j.dss.2013.07.009
[15] Su P, Mao W J, Zeng D, et al. Mining Actionable Behavioral Rules[J]. Decision Support Systems, 2012, 54(1): 142-152.
doi: 10.1016/j.dss.2012.04.013
[16] Kalanat N, Khanjari E. Extracting Actionable Knowledge from Social Networks with Node Attributes[J]. Expert Systems with Applications, 2020, 152: 113382.
doi: 10.1016/j.eswa.2020.113382
[17] Mardini M T, Raś Z W. Extraction of Actionable Knowledge to Reduce Hospital Readmissions Through Patients Personalization[J]. Information Sciences, 2019, 485: 1-17.
doi: 10.1016/j.ins.2019.02.006
[18] Kropczynski J, Grace R, Coche J, et al. Identifying Actionable Information on Social Media for Emergency Dispatch[C]// Proceedings of the 1st International Conference on Information Systems for Crisis Response and Management Asia Pacific. 2018: 428-438.
[19] Phan M H, Ogunbona P O. Modelling Context and Syntactical Features for Aspect-Based Sentiment Analysis[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3211-3220.
[20] Liao J, Wang S G, Li D Y. Identification of Fact-Implied Implicit Sentiment Based on Multi-Level Semantic Fused Representation[J]. Knowledge-Based Systems, 2019, 165: 197-207.
doi: 10.1016/j.knosys.2018.11.023
[21] Ouchi H, Shindo H, Matsumoto Y. A Span Selection Model for Semantic Role Labeling[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 1630-1642.
[22] Chen M Y, Ge T, Zhang X X, et al. Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 7162-7169.
[23] Zhou Y, Huang L T, Guo T, et al. A Span-Based Joint Model for Opinion Target Extraction and Target Sentiment Classification[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019: 5485-5491.
[24] Li F, Wang Z, Hui S C, et al. A Segment Enhanced Span-Based Model for Nested Named Entity Recognition[J]. Neurocomputing, 2021, 465: 26-37.
doi: 10.1016/j.neucom.2021.08.094
[25] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[26] Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[27] Voutilainen A. Part-of-Speech Tagging[A]//The Oxford Handbook of Computational Linguistics[M]. New York: Oxford University Press, 2004.
[28] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[29] Luan Y, Wadden D, He L H, et al. A General Framework for Information Extraction Using Dynamic Span Graphs[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 3036-3046.
[30] Viera A J, Garrett J M. Understanding Interobserver Agreement: The Kappa Statistic[J]. Family Medicine, 2005, 37(5): 360-363.
pmid: 15883903
[31] Keskar N S, Socher R. Improving Generalization Performance by Switching from Adam to SGD[OL]. arXiv Preprint, arXiv:1712.07628.
[32] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[33] Yang C, Zhang H F, Jiang B, et al. Aspect-Based Sentiment Analysis with Alternating Coattention Networks[J]. Information Processing & Management, 2019, 56(3): 463-478.
doi: 10.1016/j.ipm.2018.12.004
[34] Wu H Y, Liu Y, Shi S Y. Modularized Syntactic Neural Networks for Sentence Classification[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 2786-2792.
[35] Shah A M, Yan X B, Shah S A A, et al. Mining Patient Opinion to Evaluate the Service Quality in Healthcare: A Deep-learning Approach[J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 11(7): 2925-2942.
doi: 10.1007/s12652-019-01434-8
[1] 施运梅, 袁博, 张乐, 吕学强. IMTS:融合图像与文本语义的虚假评论检测方法*[J]. 数据分析与知识发现, 2022, 6(8): 84-96.
[2] 郑洁, 黄辉, 秦永彬. 一种融合法律知识的相似案例匹配模型*[J]. 数据分析与知识发现, 2022, 6(7): 99-106.
[3] 吴江, 刘涛, 刘洋. 在线社区用户画像及自我呈现主题挖掘——以网易云音乐社区为例*[J]. 数据分析与知识发现, 2022, 6(7): 56-69.
[4] 潘慧萍, 李宝安, 张乐, 吕学强. 基于多特征融合的政府工作报告关键词提取研究*[J]. 数据分析与知识发现, 2022, 6(5): 54-63.
[5] 肖悦珺, 李红莲, 张乐, 吕学强, 游新冬. 特征融合的中文专利文本分类方法研究*[J]. 数据分析与知识发现, 2022, 6(4): 49-59.
[6] 杨林, 黄晓硕, 王嘉阳, 丁玲玲, 李子孝, 李姣. 基于BERT-TextCNN的临床试验疾病亚型识别研究*[J]. 数据分析与知识发现, 2022, 6(4): 69-81.
[7] 郭航程, 何彦青, 兰天, 吴振峰, 董诚. 基于Paragraph-BERT-CRF的科技论文摘要语步功能信息识别方法研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 298-307.
[8] 王永生, 王昊, 虞为, 周泽聿. 融合结构和内容的方志文本人物关系抽取方法*[J]. 数据分析与知识发现, 2022, 6(2/3): 318-328.
[9] 张云秋, 汪洋, 李博诚. 基于RoBERTa-wwm动态融合模型的中文电子病历命名实体识别*[J]. 数据分析与知识发现, 2022, 6(2/3): 242-250.
[10] 严冬梅, 何雯馨, 陈智. 融合情感特征的基于RoBERTa-TCN的股价预测研究[J]. 数据分析与知识发现, 2022, 6(12): 123-134.
[11] 胡忠义,张硕果,吴江. 基于URL多粒度特征融合的钓鱼网站识别*[J]. 数据分析与知识发现, 2022, 6(11): 103-110.
[12] 贾明华, 王秀利. 基于BERT和互信息的金融风险逻辑关系量化方法[J]. 数据分析与知识发现, 2022, 6(10): 68-78.
[13] 谢星雨, 余本功. 基于MFFMB的电商评论文本分类研究*[J]. 数据分析与知识发现, 2022, 6(1): 101-112.
[14] 张玉洁, 白如江, 许海云, 韩靖, 赵梦梦. 融合多自然语言处理任务的中医辅助诊疗方案研究——以糖尿病为例*[J]. 数据分析与知识发现, 2022, 6(1): 122-133.
[15] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn