Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (1): 63-75    DOI: 10.11925/infotech.2096-3467.2022.0207
Current Issue | Archive | Adv Search |
Mining Differentiated Demands with Aspect Word Extraction: Case Study of Smartphone Reviews
Xiao Yuhan,Lin Huiping()
School of Software & Microelectronics, Peking University, Beijing 102600, China
Download: PDF (1090 KB)   HTML ( 29
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new deep learning algorithm to extract aspect words, aiming to achieve differentiated and refined user demand analysis. [Methods] We designed a Context Window Self-Attention (CWSA) model to extract aspect words. This model focuses on semantics of the context window and adjacent texts based on overall information of the full-texts. Then, we extracted the fine-grained product features from their reviews. Finally, we conducted the aspect-level sentiment analysis to further examine user demands. [Results] The paper constructed a Chinese dataset for aspect word extraction and aspect-level sentiment analysis with nearly 900,000 reviews of smartphones sold by JD.com. The proposed CWSA model’s F1 score reached 89.65% on this dataset, which was better than those of the baseline models. [Limitations] There are limited publicly accessible Chinese datasets for aspect word extraction and aspect-level sentiments. More Chinese and English datasets of multiple products need to be constructed to improve our model’s cross-language adaptability. [Conclusions] The proposed model improves differentiated and refined data mining.

Key wordsDeep Learning      Aspect Word Extraction      Sentiment Analysis      Differentiated Demand Mining     
Received: 13 March 2022      Published: 16 February 2023
ZTFLH:  TP391  
Fund:National Key R&D Program of China(2018YFB1702900)
Corresponding Authors: Lin Huiping,ORCID:0000-0002-0500-1163, E-mail: linhp@ss.pku.edu.cn。   

Cite this article:

Xiao Yuhan, Lin Huiping. Mining Differentiated Demands with Aspect Word Extraction: Case Study of Smartphone Reviews. Data Analysis and Knowledge Discovery, 2023, 7(1): 63-75.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0207     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I1/63

Flow Chart of Demand Mining Method Based on CWSA Aspect Word Extraction Model
Network Structure of CWSA
评论文本
标签 O O O O B I O O O B I I I O O O
情感倾向 -2 -2 -2 -2 -1 -1 -2 -2 -2 1 1 1 1 -2 -2 -2
An Example of the Dataset Labeling
模型 F1/%
BiLSTM 76.46
CNN-CRF 80.59
BiLSTM-CRF 81.60
CGN 82.09
BiLSTM-CNN-CRF 82.23
Seq2Seq4ATE 83.44
BERT-base 87.99
BERT-BiLSTM-CRF 88.34
LCF-ATEPC 88.82
CWSA(本文) 89.65
Comparison of Experimental Results
用户评分 占比/%
1 13.80
2 3.20
3 8.10
4 0.80
5 74.10
Distribution of User Ratings
Distribution of User Comment Length
Length Distribution of Jieba Segmentation Results and that of Words Extracted by CWSA
Proportion of the Sum of Aspect Word Frequency in Each Interval
方面词 总频数 正面情感 中立情感 负面情感
屏幕 250 088 176 901 47 378 25 809
运行速度 248 014 203 120 13 429 21 465
拍照效果 215 501 175 628 17 103 22 770
音效 200 430 154 316 32 675 13 439
充电速度 7 503 6 967 173 363
发货速度 5 764 5 223 83 458
电池容量 5 162 4 256 340 566
屏幕分辨率 3 138 2 466 287 385
价保 1 613 35 16 1562
客服服务态度 1 439 1 133 45 261
后置摄像头 915 504 89 322
相素 432 343 12 77
120Hz刷新率 370 347 4 19
双立体声扬声器 40 38 2 0
諾基亚 6 2 0 4
超级快冲 5 5 0 0
Examples of Aspect Words and Corresponding Emotional Attitudes from JD Mobile Phone Reviews
[1] Smith S, Smith G, Shen Y T. Redesign for Product Innovation[J]. Design Studies, 2012, 33(2): 160-184.
doi: 10.1016/j.destud.2011.08.003
[2] Liu C, Ramirez­Serrano A, Yin G F. An Optimum Design Selection Approach for Product Customization Development[J]. Journal of Intelligent Manufacturing, 2012, 23(4): 1433-1443.
doi: 10.1007/s10845-010-0473-5
[3] Gangurde S R, Akarte M M. Customer Preference Oriented Product Design Using AHP­Modified TOPSIS Approach[J]. Benchmarking: An International Journal, 2013, 20(4): 549-564.
doi: 10.1108/BIJ-08-2011-0058
[4] Geyer F, Lehnen J, Herstatt C. Customer Need Identification Methods in New Product Development: What Works “Best”?[J]. International Journal of Innovation and Technology Management, 2018, 15(1): 1850008.
doi: 10.1142/S0219877018500086
[5] Rai R. Identifying Key Product Attributes and Their Importance Levels from Online Customer Reviews[C]// Proceedings of ASME 2012 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. 2013: 533-540.
[6] Zhou F, Jiao J R, Schaefer D, et al. Hybrid Association Mining and Refinement for Affective Mapping in Emotional Design[J]. Journal of Computing and Information Science in Engineering, 2010, 10(3): 031010.
doi: 10.1115/1.3482063
[7] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[8] Jones K S. A Statistical Interpretation of Term Specificity and Its Application in Retrieval[J]. Journal of Documentation, 1972, 28(1): 11-21.
doi: 10.1108/eb026526
[9] Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004:404-411.
[10] Zhang L, Chu X N, Xue D Y. Identification of the To­Be­Improved Product Features Based on Online Reviews for Product Redesign[J]. International Journal of Production Research, 2019, 57(8): 2464-2479.
doi: 10.1080/00207543.2018.1521019
[11] Lai X J, Zhang Q X, Chen Q X, et al. The Analytics of Product­Design Requirements Using Dynamic Internet Data: Application to Chinese Smartphone Market[J]. International Journal of Production Research, 2019, 57(18): 5660-5684.
doi: 10.1080/00207543.2018.1541200
[12] 李贺, 曹阳, 沈旺, 等. 基于LDA主题识别与Kano模型分析的用户需求研究[J]. 情报科学, 2021, 39(8): 3-11.
[12] ( Li He, Cao Yang, Shen Wang, et al. User Demand Based on LDA Subject Identification and Kano Model Analysis[J]. Information Science, 2021, 39(8): 3-11.)
[13] Guan X Y, Cheng Z Y, He X N, et al. Attentive Aspect Modeling for Review­Aware Recommendation[J]. ACM Transactions on Information Systems, 2019, 37(3): 1-27.
[14] Hu M Q, Liu B. Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[15] Turney P D. Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000, 2(4): 303-336.
doi: 10.1023/A:1009976227802
[16] Gollapalli S D, Caragea C. Extracting Keyphrases from Research Papers Using Citation Networks[C]// Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014: 1629-1635.
[17] 韩忠明, 李梦琪, 刘雯, 等. 网络评论方面级观点挖掘方法研究综述[J]. 软件学报, 2018, 29(2): 417-441.
[17] ( Han Zhongming, Li Mengqi, Liu Wen, et al. Survey of Studies on Aspect-Based Opinion Mining of Internet[J]. Journal of Software, 2018, 29(2): 417-441.)
[18] Pontiki M, Galanis D, Pavlopoulos J, et al. SemEval-2014 Task 4: Aspect Based Sentiment Analysis[C]// Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 2014: 27-35.
[19] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[20] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[21] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. 2013: 3111-3119.
[22] 常耀成, 张宇翔, 王红, 等. 特征驱动的关键词提取算法综述[J]. 软件学报, 2018, 29(7): 2046-2070.
[22] ( Chang Yaocheng, Zhang Yuxiang, Wang Hong, et al. Features Oriented Survey of State-of-the-Art Keyphrase Extraction Algorithms[J]. Journal of Software, 2018, 29(7): 2046-2070.)
[23] Liu P F, Joty S, Meng H. Fine­Grained Opinion Mining with Recurrent Neural Networks and Word Embeddings[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1433-1443.
[24] Ma D H, Li S J, Wu F Z, et al. Exploring Sequence­to­Sequence Learning in Aspect Term Extraction[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 3538-3547.
[25] Sutskever I, Vinyals O, Le Q V. Sequence to SequenceLearning with Neural Networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. 2014: 3104-3112.
[26] Yang H, Zeng B Q, Yang J H, et al. A Multi­task Learning Model for Chinese­Oriented Aspect Polarity Classification and Aspect Term Extraction[J]. Neurocomputing, 2021, 419: 344-356.
doi: 10.1016/j.neucom.2020.08.001
[27] 肖宇晗, 林慧苹, 汪权彬, 等. 基于双特征嵌套注意力的方面词情感分析算法[J]. 智能系统学报, 2021, 16(1): 142-151.
[27] ( Xiao Yuhan, Lin Huiping, Wang Quanbin, et al. An Algorithm for Aspect-Based Sentiment Analysis Based on Dual Features Attention-over-Attention[J]. CAAI Transactions on Intelligent Systems, 2021, 16(1): 142-151.)
[28] 张严, 李天瑞. 面向评论的方面级情感分析综述[J]. 计算机科学, 2020, 47(6): 194-200.
doi: 10.11896/jsjkx.200200127
[28] ( Zhang Yan, Li Tianrui. Review of Comment-Oriented Aspect-Based Sentiment Analysis[J]. Computer Science, 2020, 47(6): 194-200.)
doi: 10.11896/jsjkx.200200127
[29] Nguyen T H, Shirai K. Aspect­Based Sentiment Analysis Using Tree Kernel Based Relation Extraction[C]// Proceedings of International Conference on Intelligent Text Processing and Computational Linguistics. 2015: 114-125.
[30] Lipenkova J. A System for Fine­Grained Aspect­Based Sentiment Analysis of Chinese[C]// Proceedings of ACL­IJCNLP 2015 System Demonstrations. 2015: 55-60.
[31] Kiritchenko S, Zhu X D, Cherry C, et al. NRC­Canada­2014:Detecting Aspects and Sentiment in Customer Reviews[C]// Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 2014: 437-442.
[32] Ma D H, Li S J, Zhang X D, et al. Interactive Attention Networks for Aspect-Level Sentiment Classification[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 4068-4074.
[33] Song Y W, Wang J H, Jiang T, et al. Attentional Encoder Network for Targeted Sentiment Classification[OL]. arXiv Preprint, arXiv: 1902.09314.
[34] Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[35] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[36] Zeng B Q, Yang H, Xu R Y, et al. LCF: A Local Context Focus Mechanism for Aspect­Based Sentiment Classification[J]. Applied Sciences, 2019, 9(16): 3389.
doi: 10.3390/app9163389
[37] Loshchilov I, Hutter F. Fixing Weight Decay Regularization in Adam[OL]. arXiv Preprint, arXiv: 1711.05101.
[38] Schuster M, Paliwal K K. Bidirectional Recurrent Neural Networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681.
doi: 10.1109/78.650093
[39] Gers F A, Schmidhuber J, Cummins F. Learning to Forget: Continual Prediction with LSTM[J]. Neural Computation, 2000, 12(10): 2451-2471.
pmid: 11032042
[40] Xu H, Liu B, Shu L, et al. Double Embeddings and CNN­Based Sequence Labeling for Aspect Extraction[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2:Short Papers). 2018: 592-598.
[41] Huang Z H, Xu W, Yu K. Bidirectional LSTM­CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
[42] Sui D B, Chen Y B, Liu K, et al. Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 3830-3840.
[43] Reimers N, Gurevych I. Reporting Score Distributions Makes a Difference: Performance Study of LSTM­Networks for Sequence Tagging[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 338-348.
[44] Cui Y M, Che W X, Liu T, et al. Revisiting Pre­trained Models for Chinese Natural Language Processing[C]// Proceedings of the Association for Computational Linguistics:EMNLP 2020. 2020: 657-668.
[45] Li S, Zhao Z, Hu R F, et al. Analogical Reasoning on Chinese Morphological and Semantic Relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 138-143.
[46] Collobert R, Weston J, Bottou L, et al. Natural Language Processing (Almost) from Scratch[J]. Journal of Machine Learning Research, 2011, 12: 2493-2537.
[1] Xu Yuemei, Cao Han, Wang Wenqing, Du Wanze, Xu Chengyang. Cross-Lingual Sentiment Analysis: A Survey[J]. 数据分析与知识发现, 2023, 7(1): 1-21.
[2] Wang Weijun, Ning Zhiyuan, Du Yi, Zhou Yuanchun. Identifying Interdisciplinary Sci-Tech Literature Based on Multi-Label Classification[J]. 数据分析与知识发现, 2023, 7(1): 102-112.
[3] Cheng Quan, She Dexin. Drug Recommendation Based on Graph Neural Network with Patient Signs and Medication Data[J]. 数据分析与知识发现, 2022, 6(9): 113-124.
[4] Xiao Hanqiong, Zhang Xinyu, Xiao Yuhan, Lin Huiping. Creating Consumer Psychology Portrait with Aspect Words[J]. 数据分析与知识发现, 2022, 6(6): 22-31.
[5] Wang Lu, Le Xiaoqiu. Research Progress on Citation Analysis of Scientific Papers[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
[6] Zheng Xiao, Li Shuqing, Zhang Zhiwang. Measuring User Item Quality with Rating Analysis for Deep Recommendation Model[J]. 数据分析与知识发现, 2022, 6(4): 39-48.
[7] Yu Chuanming, Lin Hongjun, Zhang Zhengang. Joint Extraction Model for Entities and Events with Multi-task Deep Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[8] Zhang Yunqiu, Li Bocheng, Chen Yan. Automatic Classification with Unbalanced Data for Electronic Medical Records[J]. 数据分析与知识发现, 2022, 6(2/3): 233-241.
[9] Zhang Fangcong, Qin Qiuli, Jiang Yong, Zhuang Runtao. Named Entity Recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF[J]. 数据分析与知识发现, 2022, 6(2/3): 251-262.
[10] Shang Rongxuan, Zhang Bin, Mi Jianing. End-to-End Aspect-Level Sentiment Analysis for E-Government Applications Based on BRNN[J]. 数据分析与知识发现, 2022, 6(2/3): 364-375.
[11] Hu Yamin, Wu Xiaoyan, Chen Fang. Review of Technology Term Recognition Studies Based on Machine Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 7-17.
[12] Liu Yang, Ma Lili, Zhang Wen, Hu Zhongyi, Wu Jiang. Detecting Sarcasm from Travel Reviews Based on Cross-Modal Deep Learning[J]. 数据分析与知识发现, 2022, 6(12): 23-31.
[13] Cao Lina,Zhang Jian,Chen Jindong,Fan Hui. Comprehensive Quality Profiling for Micro-, Small-, and Medium-sized Enterprises Based on Deep Learning[J]. 数据分析与知识发现, 2022, 6(11): 126-138.
[14] Li Zhi, Sun Rui, Yao Yuxuan, Li Xiaohuan. Recommending Point-of-Interests with Real-Time Event Detection[J]. 数据分析与知识发现, 2022, 6(10): 114-127.
[15] Huang Xuejian, Liu Yuyang, Ma Tinghuai. Classification Model for Scholarly Articles Based on Improved Graph Neural Network[J]. 数据分析与知识发现, 2022, 6(10): 93-102.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn