Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (1): 63-75     https://doi.org/10.11925/infotech.2096-3467.2022.0207
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于CWSA方面词提取模型的差异化需求挖掘方法研究——以京东手机评论为例*
肖宇晗,林慧苹()
北京大学软件与微电子学院 北京 102600
Mining Differentiated Demands with Aspect Word Extraction: Case Study of Smartphone Reviews
Xiao Yuhan,Lin Huiping()
School of Software & Microelectronics, Peking University, Beijing 102600, China
全文: PDF (1090 KB)   HTML ( 29
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 提出一种基于深度学习的方面词提取方法,实现差异化与精细化的挖掘分析。【方法】 设计语境窗口自注意力(Context Window Self-Attention,CWSA)模型进行方面词提取,在把握文本整体信息的基础上,聚焦语境窗口内以及邻近文本的语义,从评论中挖掘细粒度的产品特征。在此基础上,采用方面级情感分析方法分析用户需求。【结果】 根据京东手机评论构造了方面词提取和方面级情感分析中文数据集,CWSA模型在该数据集上F1分数达到89.65%,效果优于基线方面词提取模型。【局限】 公开的中文领域方面词数据集较为匮乏,未来将构建多个产品的中文数据集以获得更丰富的实验分析,并在英文数据集上拓展模型的跨语言适应能力。【结论】 在近90万条京东手机评论上进行模型的应用验证,表明所提模型能为企业提供差异化与精细化的挖掘分析。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
肖宇晗
林慧苹
关键词 深度学习方面词提取情感分析差异化需求挖掘    
Abstract

[Objective] This paper proposes a new deep learning algorithm to extract aspect words, aiming to achieve differentiated and refined user demand analysis. [Methods] We designed a Context Window Self-Attention (CWSA) model to extract aspect words. This model focuses on semantics of the context window and adjacent texts based on overall information of the full-texts. Then, we extracted the fine-grained product features from their reviews. Finally, we conducted the aspect-level sentiment analysis to further examine user demands. [Results] The paper constructed a Chinese dataset for aspect word extraction and aspect-level sentiment analysis with nearly 900,000 reviews of smartphones sold by JD.com. The proposed CWSA model’s F1 score reached 89.65% on this dataset, which was better than those of the baseline models. [Limitations] There are limited publicly accessible Chinese datasets for aspect word extraction and aspect-level sentiments. More Chinese and English datasets of multiple products need to be constructed to improve our model’s cross-language adaptability. [Conclusions] The proposed model improves differentiated and refined data mining.

Key wordsDeep Learning    Aspect Word Extraction    Sentiment Analysis    Differentiated Demand Mining
收稿日期: 2022-03-13      出版日期: 2023-02-16
ZTFLH:  TP391  
基金资助:*国家重点研发计划的研究成果之一(2018YFB1702900)
通讯作者: 林慧苹,ORCID:0000-0002-0500-1163, E-mail: linhp@ss.pku.edu.cn。   
引用本文:   
肖宇晗, 林慧苹. 基于CWSA方面词提取模型的差异化需求挖掘方法研究——以京东手机评论为例*[J]. 数据分析与知识发现, 2023, 7(1): 63-75.
Xiao Yuhan, Lin Huiping. Mining Differentiated Demands with Aspect Word Extraction: Case Study of Smartphone Reviews. Data Analysis and Knowledge Discovery, 2023, 7(1): 63-75.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0207      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I1/63
Fig.1  基于CWSA方面词提取模型的需求挖掘方法流程
Fig.2  CWSA模型结构
评论文本
标签 O O O O B I O O O B I I I O O O
情感倾向 -2 -2 -2 -2 -1 -1 -2 -2 -2 1 1 1 1 -2 -2 -2
Table 1  数据集标注示例
模型 F1/%
BiLSTM 76.46
CNN-CRF 80.59
BiLSTM-CRF 81.60
CGN 82.09
BiLSTM-CNN-CRF 82.23
Seq2Seq4ATE 83.44
BERT-base 87.99
BERT-BiLSTM-CRF 88.34
LCF-ATEPC 88.82
CWSA(本文) 89.65
Table 2  实验结果对比
用户评分 占比/%
1 13.80
2 3.20
3 8.10
4 0.80
5 74.10
Table 3  用户评分分布
Fig.3  用户评论长度分布
Fig.4  Jieba分词结果和CWSA提取出的词的长度分布对比
Fig.5  各区间方面词词频之和的占比
方面词 总频数 正面情感 中立情感 负面情感
屏幕 250 088 176 901 47 378 25 809
运行速度 248 014 203 120 13 429 21 465
拍照效果 215 501 175 628 17 103 22 770
音效 200 430 154 316 32 675 13 439
充电速度 7 503 6 967 173 363
发货速度 5 764 5 223 83 458
电池容量 5 162 4 256 340 566
屏幕分辨率 3 138 2 466 287 385
价保 1 613 35 16 1562
客服服务态度 1 439 1 133 45 261
后置摄像头 915 504 89 322
相素 432 343 12 77
120Hz刷新率 370 347 4 19
双立体声扬声器 40 38 2 0
諾基亚 6 2 0 4
超级快冲 5 5 0 0
Table 4  京东手机评论中的方面词及其情感态度示例
[1] Smith S, Smith G, Shen Y T. Redesign for Product Innovation[J]. Design Studies, 2012, 33(2): 160-184.
doi: 10.1016/j.destud.2011.08.003
[2] Liu C, Ramirez­Serrano A, Yin G F. An Optimum Design Selection Approach for Product Customization Development[J]. Journal of Intelligent Manufacturing, 2012, 23(4): 1433-1443.
doi: 10.1007/s10845-010-0473-5
[3] Gangurde S R, Akarte M M. Customer Preference Oriented Product Design Using AHP­Modified TOPSIS Approach[J]. Benchmarking: An International Journal, 2013, 20(4): 549-564.
doi: 10.1108/BIJ-08-2011-0058
[4] Geyer F, Lehnen J, Herstatt C. Customer Need Identification Methods in New Product Development: What Works “Best”?[J]. International Journal of Innovation and Technology Management, 2018, 15(1): 1850008.
doi: 10.1142/S0219877018500086
[5] Rai R. Identifying Key Product Attributes and Their Importance Levels from Online Customer Reviews[C]// Proceedings of ASME 2012 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. 2013: 533-540.
[6] Zhou F, Jiao J R, Schaefer D, et al. Hybrid Association Mining and Refinement for Affective Mapping in Emotional Design[J]. Journal of Computing and Information Science in Engineering, 2010, 10(3): 031010.
doi: 10.1115/1.3482063
[7] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[8] Jones K S. A Statistical Interpretation of Term Specificity and Its Application in Retrieval[J]. Journal of Documentation, 1972, 28(1): 11-21.
doi: 10.1108/eb026526
[9] Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004:404-411.
[10] Zhang L, Chu X N, Xue D Y. Identification of the To­Be­Improved Product Features Based on Online Reviews for Product Redesign[J]. International Journal of Production Research, 2019, 57(8): 2464-2479.
doi: 10.1080/00207543.2018.1521019
[11] Lai X J, Zhang Q X, Chen Q X, et al. The Analytics of Product­Design Requirements Using Dynamic Internet Data: Application to Chinese Smartphone Market[J]. International Journal of Production Research, 2019, 57(18): 5660-5684.
doi: 10.1080/00207543.2018.1541200
[12] 李贺, 曹阳, 沈旺, 等. 基于LDA主题识别与Kano模型分析的用户需求研究[J]. 情报科学, 2021, 39(8): 3-11.
[12] ( Li He, Cao Yang, Shen Wang, et al. User Demand Based on LDA Subject Identification and Kano Model Analysis[J]. Information Science, 2021, 39(8): 3-11.)
[13] Guan X Y, Cheng Z Y, He X N, et al. Attentive Aspect Modeling for Review­Aware Recommendation[J]. ACM Transactions on Information Systems, 2019, 37(3): 1-27.
[14] Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[15] Turney P D. Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000, 2(4): 303-336.
doi: 10.1023/A:1009976227802
[16] Gollapalli S D, Caragea C. Extracting Keyphrases from Research Papers Using Citation Networks[C]// Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014: 1629-1635.
[17] 韩忠明, 李梦琪, 刘雯, 等. 网络评论方面级观点挖掘方法研究综述[J]. 软件学报, 2018, 29(2): 417-441.
[17] ( Han Zhongming, Li Mengqi, Liu Wen, et al. Survey of Studies on Aspect-Based Opinion Mining of Internet[J]. Journal of Software, 2018, 29(2): 417-441.)
[18] Pontiki M, Galanis D, Pavlopoulos J, et al. SemEval-2014 Task 4: Aspect Based Sentiment Analysis[C]// Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 2014: 27-35.
[19] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[20] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[21] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. 2013: 3111-3119.
[22] 常耀成, 张宇翔, 王红, 等. 特征驱动的关键词提取算法综述[J]. 软件学报, 2018, 29(7): 2046-2070.
[22] ( Chang Yaocheng, Zhang Yuxiang, Wang Hong, et al. Features Oriented Survey of State-of-the-Art Keyphrase Extraction Algorithms[J]. Journal of Software, 2018, 29(7): 2046-2070.)
[23] Liu P F, Joty S, Meng H. Fine­Grained Opinion Mining with Recurrent Neural Networks and Word Embeddings[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1433-1443.
[24] Ma D H, Li S J, Wu F Z, et al. Exploring Sequence­to­Sequence Learning in Aspect Term Extraction[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 3538-3547.
[25] Sutskever I, Vinyals O, Le Q V. Sequence to SequenceLearning with Neural Networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. 2014: 3104-3112.
[26] Yang H, Zeng B Q, Yang J H, et al. A Multi­task Learning Model for Chinese­Oriented Aspect Polarity Classification and Aspect Term Extraction[J]. Neurocomputing, 2021, 419: 344-356.
doi: 10.1016/j.neucom.2020.08.001
[27] 肖宇晗, 林慧苹, 汪权彬, 等. 基于双特征嵌套注意力的方面词情感分析算法[J]. 智能系统学报, 2021, 16(1): 142-151.
[27] ( Xiao Yuhan, Lin Huiping, Wang Quanbin, et al. An Algorithm for Aspect-Based Sentiment Analysis Based on Dual Features Attention-over-Attention[J]. CAAI Transactions on Intelligent Systems, 2021, 16(1): 142-151.)
[28] 张严, 李天瑞. 面向评论的方面级情感分析综述[J]. 计算机科学, 2020, 47(6): 194-200.
doi: 10.11896/jsjkx.200200127
[28] ( Zhang Yan, Li Tianrui. Review of Comment-Oriented Aspect-Based Sentiment Analysis[J]. Computer Science, 2020, 47(6): 194-200.)
doi: 10.11896/jsjkx.200200127
[29] Nguyen T H, Shirai K. Aspect­Based Sentiment Analysis Using Tree Kernel Based Relation Extraction[C]// Proceedings of International Conference on Intelligent Text Processing and Computational Linguistics. 2015: 114-125.
[30] Lipenkova J. A System for Fine­Grained Aspect­Based Sentiment Analysis of Chinese[C]// Proceedings of ACL­IJCNLP 2015 System Demonstrations. 2015: 55-60.
[31] Kiritchenko S, Zhu X D, Cherry C, et al. NRC­Canada­2014:Detecting Aspects and Sentiment in Customer Reviews[C]// Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 2014: 437-442.
[32] Ma D H, Li S J, Zhang X D, et al. Interactive Attention Networks for Aspect-Level Sentiment Classification[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 4068-4074.
[33] Song Y W, Wang J H, Jiang T, et al. Attentional Encoder Network for Targeted Sentiment Classification[OL]. arXiv Preprint, arXiv: 1902.09314.
[34] Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[35] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[36] Zeng B Q, Yang H, Xu R Y, et al. LCF: A Local Context Focus Mechanism for Aspect­Based Sentiment Classification[J]. Applied Sciences, 2019, 9(16): 3389.
doi: 10.3390/app9163389
[37] Loshchilov I, Hutter F. Fixing Weight Decay Regularization in Adam[OL]. arXiv Preprint, arXiv: 1711.05101.
[38] Schuster M, Paliwal K K. Bidirectional Recurrent Neural Networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681.
doi: 10.1109/78.650093
[39] Gers F A, Schmidhuber J, Cummins F. Learning to Forget: Continual Prediction with LSTM[J]. Neural Computation, 2000, 12(10): 2451-2471.
pmid: 11032042
[40] Xu H, Liu B, Shu L, et al. Double Embeddings and CNN­Based Sequence Labeling for Aspect Extraction[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2:Short Papers). 2018: 592-598.
[41] Huang Z H, Xu W, Yu K. Bidirectional LSTM­CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
[42] Sui D B, Chen Y B, Liu K, et al. Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 3830-3840.
[43] Reimers N, Gurevych I. Reporting Score Distributions Makes a Difference: Performance Study of LSTM­Networks for Sequence Tagging[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 338-348.
[44] Cui Y M, Che W X, Liu T, et al. Revisiting Pre­trained Models for Chinese Natural Language Processing[C]// Proceedings of the Association for Computational Linguistics:EMNLP 2020. 2020: 657-668.
[45] Li S, Zhao Z, Hu R F, et al. Analogical Reasoning on Chinese Morphological and Semantic Relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 138-143.
[46] Collobert R, Weston J, Bottou L, et al. Natural Language Processing (Almost) from Scratch[J]. Journal of Machine Learning Research, 2011, 12: 2493-2537.
[1] 徐月梅, 曹晗, 王文清, 杜宛泽, 徐承炀. 跨语言情感分析研究综述*[J]. 数据分析与知识发现, 2023, 7(1): 1-21.
[2] 王卫军, 宁致远, 杜一, 周园春. 基于多标签分类的科技文献学科交叉研究性质识别*[J]. 数据分析与知识发现, 2023, 7(1): 102-112.
[3] 成全, 佘德昕. 融合患者体征与用药数据的图神经网络药物推荐方法研究*[J]. 数据分析与知识发现, 2022, 6(9): 113-124.
[4] 肖寒琼, 张馨遇, 肖宇晗, 林慧苹. 基于方面词的用户消费心理画像方法*[J]. 数据分析与知识发现, 2022, 6(6): 22-31.
[5] 王露, 乐小虬. 科技论文引用内容分析研究进展[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
[6] 郑潇, 李树青, 张志旺. 基于评分数值分析的用户项目质量测度及其在深度推荐模型中的应用*[J]. 数据分析与知识发现, 2022, 6(4): 39-48.
[7] 余传明, 林虹君, 张贞港. 基于多任务深度学习的实体和事件联合抽取模型*[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[8] 张云秋, 李博诚, 陈妍. 面向不平衡数据的电子病历自动分类研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 233-241.
[9] 张芳丛, 秦秋莉, 姜勇, 庄润涛. 基于RoBERTa-WWM-BiLSTM-CRF的中文电子病历命名实体识别研究[J]. 数据分析与知识发现, 2022, 6(2/3): 251-262.
[10] 商容轩, 张斌, 米加宁. 基于BRNN的政务APP评论端到端方面级情感分析方法*[J]. 数据分析与知识发现, 2022, 6(2/3): 364-375.
[11] 胡雅敏, 吴晓燕, 陈方. 基于机器学习的技术术语识别研究综述[J]. 数据分析与知识发现, 2022, 6(2/3): 7-17.
[12] 刘洋, 马莉莉, 张雯, 胡忠义, 吴江. 基于跨模态深度学习的旅游评论反讽识别*[J]. 数据分析与知识发现, 2022, 6(12): 23-31.
[13] 曹丽娜,张健,陈进东,樊辉. 基于深度学习的中小微企业综合质量画像构建研究*[J]. 数据分析与知识发现, 2022, 6(11): 126-138.
[14] 李治, 孙锐, 姚羽轩, 李小欢. 基于实时事件侦测的兴趣点推荐系统研究*[J]. 数据分析与知识发现, 2022, 6(10): 114-127.
[15] 黄学坚, 刘雨飏, 马廷淮. 基于改进型图神经网络的学术论文分类模型*[J]. 数据分析与知识发现, 2022, 6(10): 93-102.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn