Please wait a minute...
Data Analysis and Knowledge Discovery  2024, Vol. 8 Issue (5): 102-112    DOI: 10.11925/infotech.2096-3467.2023.0519
Current Issue | Archive | Adv Search |
Sentiment Analysis of User Reviews Integrating Margin Sampling and Tri-training
Jiang Yiping1,Zhang Ting1,Xia Zhengming1,Li Yuhua2,Zhang Zhaotong1()
1College of Information Management, Nanjing Agricultural University, Nanjing 210031, China
2College of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210031, China
Download: PDF (1233 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a sentiment analysis method for user reviews integrating margin sampling and tri-training. It addresses the issues of the large volume of user reviews, ambiguous sentiment tendencies, and short content. [Methods] First, we constructed a multi-class support vector machine based on a one-vs-all decomposition strategy. Then, we integrated a margin sampling strategy considering cosine similarity to create an initial set. Finally, we proposed a Tri-training algorithm combining a soft voting mechanism. [Results] The proposed algorithm improved the voting mechanism in the Tri-training algorithm, which further reduced the probability of misjudgment in sample classification by multiple classifiers. All categories achieved precision rates above 79%. [Limitations] The proposed method does not consider extracting information from multimedia data. [Conclusions] Compared with traditional and recently improved semi-supervised learning algorithms, the proposed algorithm demonstrates classification accuracy and efficiency superiority.

Key wordsUser Reviews      Sentiment Analysis      Margin Sampling      Tri-Training     
Received: 31 May 2023      Published: 08 January 2024
ZTFLH:  TP391  
  G350  
Fund:Social Science Foundation of Jiangsu Province(21GLC003);Humanity and Social Science Project of Ministry of Education of China(22YJA630033);Postgraduate Research & Practice Innovation Program of Jiangsu Province(SJCX23_0229)
Corresponding Authors: Zhang Zhaotong,ORCID:0000-0002-1155-8603,E-mail: zzt5576@njau.edu.cn。   

Cite this article:

Jiang Yiping, Zhang Ting, Xia Zhengming, Li Yuhua, Zhang Zhaotong. Sentiment Analysis of User Reviews Integrating Margin Sampling and Tri-training. Data Analysis and Knowledge Discovery, 2024, 8(5): 102-112.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2023.0519     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2024/V8/I5/102

E-Commerce Review Sentiment Analysis Framework Based on Margin Sampling and Tri-training
Improved Margin Sampling
Data Preprocessing
评论内容 分组 分词结果
看起来很新鲜,京东自营的生鲜质量很令人放心 5 看起来/很/新鲜,京东/自营/的/生鲜/质量/很/令人放心
大品牌鲜果,很新鲜,选这个牌子也是精挑细选了好久好久,送货速度快 5 大品牌/鲜果,很/新鲜,选/这个/牌子/也是/精挑细选/了/好久好久,送货/速度/快
纯进口鲜果口感特别香甜软糯,值得大家购买 5 纯/进口/鲜果/口感/特别/香甜/软糯,值得/大家/购买
…… …… ……
这次买的不好,坏的很快 1 这次/买/的/不好,坏/的/很快
70块钱买了一堆臭东西,最信任的平台 1 70/块钱/买/了/一堆/臭/东西,最/信任/的/平台
Word Segmentation and Grouping of E-commerce Reviews
Number of Reviews on Main Product Attributes
The One vs All Strategy
Demonstration of the One vs All Decomposition
方法 Labeled5 Labeled10
情感类别 5 4 3 2 1 5 4 3 2 1
评论总数 704 711 703 695 687 709 691 714 700 686
Self-training 精确率(%) 63.3 65.5 71.4 73.3 70.0 66.1 70.7 66.1 72.9 74.8
召回率(%) 65.3 67.2 70.3 69.9 71.4 63.9 71.8 75.6 67.7 72.3
F1值(%) 64.3 66.3 70.8 71.6 70.7 65.0 71.2 70.5 70.2 73.5
Tri-training 精确率(%) 70.2 73.5 75.3 75.9 74.0 72.5 74.3 76.2 73.3 75.8
召回率(%) 72.7 77.0 80.9 81.2 75.8 76.5 69.9 79.9 74.9 73.1
F1值(%) 73.4 77.7 80.6 80.5 76.9 74.4 72.0 78.0 72.5 74.4
DW-TCI 精确率(%) 75.7 76.7 78.1 77.4 76.0 77.1 77.6 78.9 77.9 76.5
召回率(%) 74.7 77.7 81.4 81.8 76.9 77.8 73.4 79.2 74.3 74.5
F1值(%) 75.2 78.1 80.8 80.9 77.1 77.9 74.5 78.6 75.3 76.4
改进SVM 精确率(%) 76.1 77.3 78.5 77.9 76.5 77.4 78.4 79.2 78.1 78.8
召回率(%) 75.1 77.9 81.9 82.2 77.1 78.5 75.1 78.9 76.2 77.4
F1值(%) 76.7 78.5 81.2 81.5 78.3 78.6 75.7 79.2 77.5 78.1
IMS-Tri-training 精确率(%) 79.5 82.6 81.9 82.1 81.2 84.3 86.8 82.8 81.9 87.3
召回率(%) 84.1 80.2 84.4 83.0 82.8 87.3 82.2 79.0 79.1 84.9
F1值(%) 81.7 81.4 83.1 82.5 82.0 85.8 84.4 80.9 80.5 86.1
Classification Effect of Five Semi-Supervised Learning Methods
Classification Accuracy of Semi-Supervised Learning Methods with Different Sample Sizes
Classification Analysis Time of Semi-Supervised Learning Methods with Different Sample Sizes
分布函数 平方误差和 AIC BIC KL散度
exponpow 0.073 24.710 -43 633.213 0.210
t 0.107 24.130 -42 111.546 0.289
norm 0.107 22.130 -42 119.752 0.289
lognorm 0.107 24.134 -42 108.580 0.289
cauchy 0.109 26.339 -42 040.730 0.314
Fitting Distribution of IMS-Tri-training Classification Data
The Fitting of Different Distributions to the Sentiment Classification of Online Reviews
[1] 中华人民共和国国家互联网信息办公室. “十四五”国家信息化规划[EB/OL]. [2021-12-27]. http://www.cac.gov.cn/2021-12/27/c_1642205314518676.htm.
[1] (Cyberspace Administration of China. The “14th Five-Year” National Informatization Planning[EB/OL]. [2021-12-27]. http://www.cac.gov.cn/2021-12/27/c_1642205314518676.htm.)
[2] 周建, 刘炎宝, 刘佳佳. 情感分析研究的知识结构及热点前沿探析[J]. 情报学报, 2020, 39(1): 111-124.
[2] (Zhou Jian, Liu Yanbao, Liu Jiajia. Exploration of Intellectual Structure and Hot Issues in Sentiment Analysis Research[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(1): 111-124.)
[3] 刘彤, 刘琛, 倪维健. 多层次数据增强的半监督中文情感分析方法[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[3] (Liu Tong, Liu Chen, Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 51-58.)
[4] 李磊, 宋建伟, 刘继. 基于在线评论情感分析的声誉影响效应研究[J]. 管理学报, 2020, 17(4): 583-591.
[4] (Li Lei, Song Jianwei, Liu Ji. Analyzing the Effect of Reputation Based on Sentiment Analysis of Online Comment Texts[J]. Chinese Journal of Management, 2020, 17(4): 583-591.)
[5] 马凤才, 李春月. 消费者对电子商务平台销售生鲜产品满意度测算研究——基于京东生鲜在线评论的分析[J]. 价格理论与实践, 2020(5): 117-120.
[5] (Ma Fengcai, Li Chunyue. Research on E-Commerce Consumer Satisfaction Measurement of Fresh Products—Analysis Based on Online Reviews of JD Fresh[J]. Price: Theory & Practice, 2020(5): 117-120.)
[6] 刘玉林, 菅利荣. 基于文本情感分析的电商在线评论数据挖掘[J]. 统计与信息论坛, 2018, 33(12): 119-124.
[6] (Liu Yulin, Jian Lirong. Data Mining of E-Commerce Online Reviews Based on Sentiment Analysis[J]. Statistics & Information Forum, 2018, 33(12): 119-124.)
[7] 卢伟聪, 徐健. 基于二分网络的网络用户评论情感分析[J]. 情报理论与实践, 2018, 41(2): 121-126.
[7] (Lu Weicong, Xu Jian. Sentiment Analysis of Network Users’ Reviews Based on Bipartite Network[J]. Information Studies: Theory & Application, 2018, 41(2): 121-126.)
[8] Chang C H, Hwang S Y, Wu M L. Learning Bilingual Sentiment Lexicon for Online Reviews[J]. Electronic Commerce Research and Applications, 2021, 47: Article No.101037.
[9] Zhang J, Lu X C, Liu D. Deriving Customer Preferences for Hotels Based on Aspect-Level Sentiment Analysis of Online Reviews[J]. Electronic Commerce Research and Applications, 2021, 49: Article No.101094.
[10] Li H, Chen Q, Zhong Z, et al. E-Word of Mouth Sentiment Analysis for User Behavior Studies[J]. Information Processing and Management, 2022, 59(1): Article No.102784.
[11] 包乾辉, 李佳利, 石淑珍, 等. 基于DSLML的鸡蛋消费在线评论情感分析[J]. 农业机械学报, 2021, 52(S1): 496-503.
[11] (Bao Qianhui, Li Jiali, Shi Shuzhen, et al. Sentimental Analysis of Online Reviews of Egg Consumption Based on DSLML[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52(S1): 496-503.)
[12] 朱晓霞, 宋嘉欣, 孟建芳. 基于主题—情感挖掘模型的微博评论情感分类研究[J]. 情报理论与实践, 2019, 42(5): 159-164.
[12] (Zhu Xiaoxia, Song Jiaxin, Meng Jianfang. Research on the Classification of Emotion in Microblog Comments Based on the Theme-Emotion Mining Model[J]. Information Studies: Theory & Application, 2019, 42(5): 159-164.)
[13] Luo J M, Vu H Q, Li G, et al. Understanding Service Attributes of Robot Hotels: A Sentiment Analysis of Customer Online Reviews[J]. International Journal of Hospitality Management, 2021, 98: Article No.103032.
[14] 李浩君, 吕韵, 汪旭辉, 等. 融入情感分析的多层交互深度推荐模型研究[J]. 数据分析与知识发现, 2023, 7(3): 43-57.
[14] (Li Haojun, Lv Yun, Wang Xuhui, et al. A Deep Recommendation Model with Multi-Layer Interaction and Sentiment Analysis[J]. Data Analysis and Knowledge Discovery, 2023, 7(3): 43-57.)
[15] Lin H C K, Wang T H, Lin G C, et al. Applying Sentiment Analysis to Automatically Classify Consumer Comments Concerning Marketing 4Cs Aspects[J]. Applied Soft Computing, 2020, 97: Article No.106755.
[16] Zhang J, Zhang A J, Liu D, et al. Customer Preferences Extraction for Air Purifiers Based on Fine-Grained Sentiment Analysis of Online Reviews[J]. Knowledge-Based Systems, 2021, 228: Article No.107259.
[17] Wang W, Guo L H, Wu Y J. The Merits of a Sentiment Analysis of Antecedent Comments for the Prediction of Online Fundraising Outcomes[J]. Technological Forecasting & Social Change, 2022, 174: Article No.121070.
[18] Wang P, Li J N, Hou J R. S2SAN: A Sentence-to-Sentence Attention Network for Sentiment Analysis of Online Reviews[J]. Decision Support Systems, 2021, 149: Article No.113603.
[19] Zhang K, Zhu Y W, Zhang W J, et al. Cross-Modal Image Sentiment Analysis via Deep Correlation of Textual Semantic[J]. Knowledge-Based Systems, 2021, 216: Article No.106803.
[20] 周宁, 钟娜, 靳高雅, 等. 基于混合词嵌入的双通道注意力网络中文文本情感分析[J]. 数据分析与知识发现, 2023, 7(3): 58-68.
[20] (Zhou Ning, Zhong Na, Jin Gaoya, et al. Chinese Text Sentiment Analysis Based on Dual Channel Attention Network with Hybrid Word Embedding[J]. Data Analysis and Knowledge Discovery, 2023, 7(3): 58-68.)
[21] 刘逸, 孟令坤, 保继刚, 等. 人工计算模型与机器学习模型的情感捕捉效度比较研究——以旅游评论数据为例[J]. 南开管理评论, 2021, 24(5): 63-74.
[21] (Liu Yi, Meng Lingkun, Bao Jigang, et al. A Comparative Study of Sentiment Computing Methods: Will Machine Learning be Overwhelming?[J]. Nankai Business Review, 2021, 24(5): 63-74.)
[22] Zhao H L, Liu Z H, Yao X M, et al. A Machine Learning-Based Sentiment Analysis of Online Product Reviews with a Novel Term Weighting and Feature Selection Approach[J]. Information Processing & Management, 2021, 58(5): Article No.102656.
[23] Liu Y, Lu J H, Yang J, et al. Sentiment Analysis for E-Commerce Product Reviews by Deep Learning Model of Bert-BiGRU-Softmax[J]. Mathematical Biosciences and Engineering, 2020, 17(6): 7819-7837.
doi: 10.3934/mbe.2020398 pmid: 33378922
[24] 史达, 王乐乐, 衣博文. 在线评论有用性的深度数据挖掘——基于TripAdvisor的酒店评论数据[J]. 南开管理评论, 2020, 23(5): 64-75.
[24] (Shi Da, Wang Lele, Yi Bowen. Deep Data Mining for Online Reviews Usefulness: Hotel Reviews Data on TripAdvisor[J]. Nankai Business Review, 2020, 23(5): 64-75.)
[25] 闫尚义, 王靖亚, 刘晓文, 等. 基于多头自注意力池化与多粒度特征交互融合的微博情感分析[J]. 数据分析与知识发现, 2023, 7(4): 32-45.
[25] (Yan Shangyi, Wang Jingya, Liu Xiaowen, et al. Microblog Sentiment Analysis with Multi-Head Self-Attention Pooling and Multi-Granularity Feature Interaction Fusion[J]. Data Analysis and Knowledge Discovery, 2023, 7(4): 32-45.)
[26] Lin X, Ho C, Xia L, et al. Sentiment Analysis of Low-Carbon Travel APP User Comments Based on Deep Learning[J]. Sustainable Energy Technologies and Assessments, 2021, 44: Article No.101014.
[27] Bigne E, Ruiz C, Cuenca A, et al. What Drives the Helpfulness of Online Reviews? A Deep Learning Study of Sentiment Analysis, Pictorial Content and Reviewer Expertise for Mature Destinations[J]. Journal of Destination Marketing & Management, 2021, 20: Article No.100570.
[28] 庞庆华, 董显蔚, 周斌, 等. 基于情感分析与TextRank的负面在线评论关键词抽取[J]. 情报科学, 2022, 40(5): 111-117.
[28] (Pang Qinghua, Dong Xianwei, Zhou Bin, et al. Keyword Extraction of Negative Online Reviews Based on Sentiment Analysis[J]. Information Science, 2022, 40(5): 111-117.)
[29] Zhu J J, Chang Y C, Ku C H, et al. Online Critical Review Classification in Response Strategy and Service Provider Rating: Algorithms from Heuristic Processing, Sentiment Analysis to Deep Learning[J]. Journal of Business Research, 2021, 129: 860-877.
[30] 刘洋, 马莉莉, 张雯, 等. 基于跨模态深度学习的旅游评论反讽识别[J]. 数据分析与知识发现, 2022, 6(12): 23-31.
[30] (Liu Yang, Ma Lili, Zhang Wen, et al. Detecting Sarcasm from Travel Reviews Based on Cross-Modal Deep Learning[J]. Data Analysis and Knowledge Discovery, 2022, 6(12): 23-31.)
[31] 张振刚, 罗泰晔. 基于在线评论数据挖掘和Kano模型的产品需求分析[J]. 管理评论, 2022, 34(11): 109-117.
[31] (Zhang Zhengang, Luo Taiye. Product Demand Analysis Based on Online Review Data Mining and Kano Model[J]. Management Review, 2022, 34(11): 109-117.)
[32] 周瑛, 张晓宇, 虞小芳. 基于产品评论挖掘的消费者偏好分析[J]. 情报科学, 2022, 40(1): 58-65.
[32] (Zhou Ying, Zhang Xiaoyu, Yu Xiaofang. User Preference Analysis Based on Product Review Mining[J]. Information Science, 2022, 40(1): 58-65.)
[33] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 63-70.
[33] (Zhou Zhihua. Machine Learning[M]. Beijing: Tsinghua University Press, 2016: 63-70.)
[34] 李航. 机器学习方法[M]. 北京: 清华大学出版社, 2022: 3-27.
[34] (Li Hang. Machine Learning Method[M]. Beijing: Tsinghua University Press, 2022: 3-27.)
[35] Scudder H. Probability of Error of Some Adaptive Pattern-Recognition Machines[J]. IEEE Transactions on Information Theory, 1965, 11(3): 363-371.
[36] Zhou Z H, Li M. Tri-Training: Exploiting Unlabeled Data Using Three Classifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541.
[37] 余本功, 汲浩敏. 基于DW-TCI的半监督文本分类方法研究[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
[37] (Yu Bengong, Ji Haomin. Semi-Supervised Method for Text Classification Based on DW-TCI[J]. Data Analysis and Knowledge Discovery, 2020, 4(10): 58-69.)
[38] 许敏. 隐空间特征增强自标记半监督SVM分类新方法[J]. 统计与决策, 2022, 38(7): 11-15.
[38] (Xu Min. A New Method of Hidden Space Feature Augmentation for Self-Labeled Semi-Supervised SVM Classification[J]. Statistics & Decision, 2022, 38(7): 11-15.)
[1] Lyu Xueqiang, Tian Chi, Zhang Le, Du Yifan, Zhang Xu, Cai Zangtai. Multimodal Sentiment Analysis Model Integrating Multi-features and Attention Mechanism[J]. 数据分析与知识发现, 2024, 8(5): 91-101.
[2] Li Hui, Hu Yaohua, Xu Cunzhen. Personalized Recommendation Algorithm with Review Sentiments and Importance[J]. 数据分析与知识发现, 2024, 8(1): 69-79.
[3] Li Xuelian, Wang Bi, Li Lixin, Han Dixuan. Sentiment Analysis with Abstract Meaning Representation and Dependency Grammar[J]. 数据分析与知识发现, 2024, 8(1): 55-68.
[4] Yan Shangyi, Wang Jingya, Liu Xiaowen, Cui Yumeng, Tao Zhizhong, Zhang Xiaofan. Microblog Sentiment Analysis with Multi-Head Self-Attention Pooling and Multi-Granularity Feature Interaction Fusion[J]. 数据分析与知识发现, 2023, 7(4): 32-45.
[5] Zhang Yu, Zhang Haijun, Liu Yaqing, Liang Kejin, Wang Yueyang. Multimodal Sentiment Analysis Based on Bidirectional Mask Attention Mechanism[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[6] Li Haojun, Lv Yun, Wang Xuhui, Huang Jieya. A Deep Recommendation Model with Multi-Layer Interaction and Sentiment Analysis[J]. 数据分析与知识发现, 2023, 7(3): 43-57.
[7] Zhou Ning, Zhong Na, Jin Gaoya, Liu Bin. Chinese Text Sentiment Analysis Based on Dual Channel Attention Network with Hybrid Word Embedding[J]. 数据分析与知识发现, 2023, 7(3): 58-68.
[8] Wang Hao, Gong Lijuan, Zhou Zeyu, Fan Tao, Wang Yongsheng. Detecting Mis/Dis-information from Social Media with Semantic Enhancement[J]. 数据分析与知识发现, 2023, 7(2): 48-60.
[9] Shen Lining, Yang Jiayi, Pei Jiaxuan, Cao Guang, Chen Gongzheng. A Fine-Grained Sentiment Recognition Method Based on OCC Model and Triggering Events[J]. 数据分析与知识发现, 2023, 7(2): 72-85.
[10] Wu Xuxu, Chen Peng, Jiang Huan. Micro-Blog Fine-Grained Sentiment Analysis Based on Multi-Feature Fusion[J]. 数据分析与知识发现, 2023, 7(12): 102-113.
[11] Li Helong, Ren Changsong, Liu Xinru, Wang Cunhua. Review of Textual Sentiment Research in Financial Markets[J]. 数据分析与知识发现, 2023, 7(12): 22-39.
[12] Cao Wei, Liao Chenyue, Zhang Fuwei. RMB Exchange Rate Forecasting Driven by Cross-Market and Cross-Source Sentiment Analysis[J]. 数据分析与知识发现, 2023, 7(12): 75-87.
[13] Lai Yubin, Chen Yan, Hu Xiaochun, Huang Xin. Sentiment Analysis of Micro-blog on Public Health Emergency with Prompt Embedding[J]. 数据分析与知识发现, 2023, 7(11): 46-55.
[14] Wu Sisi, Ma Jing. Multi-task & Multi-modal Sentiment Analysis Model Based on Aware Fusion[J]. 数据分析与知识发现, 2023, 7(10): 74-84.
[15] Xu Yuemei, Cao Han, Wang Wenqing, Du Wanze, Xu Chengyang. Cross-Lingual Sentiment Analysis: A Survey[J]. 数据分析与知识发现, 2023, 7(1): 1-21.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn