Please wait a minute...
Advanced Search
数据分析与知识发现  2024, Vol. 8 Issue (5): 102-112     https://doi.org/10.11925/infotech.2096-3467.2023.0519
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
融合边缘采样和Tri-training的用户评论情感分析方法*
江亿平1,张婷1,夏争鸣1,李玉花2,张兆同1()
1南京农业大学信息管理学院 南京 210031
2南京农业大学人工智能学院 南京 210031
Sentiment Analysis of User Reviews Integrating Margin Sampling and Tri-training
Jiang Yiping1,Zhang Ting1,Xia Zhengming1,Li Yuhua2,Zhang Zhaotong1()
1College of Information Management, Nanjing Agricultural University, Nanjing 210031, China
2College of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210031, China
全文: PDF (1233 KB)   HTML ( 14
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对用户评论数据量大、情感倾向模糊、内容短小等特点,提出融合边缘采样和Tri-training的用户评论情感分析方法。【方法】 通过构建基于一对多拆解策略的多分类支持向量机,并融合考虑余弦相似度的边缘采样策略构造初始集,提出结合软投票机制的Tri-training算法。【结果】 本文算法对Tri-training算法投票机制的改进,进一步减小了多个分类器对于样本分类投票判断失误的概率,使所有类别精确率均在79%以上。【局限】 未考虑多媒体数据的信息提取。【结论】 与传统及近年改进的半监督学习算法相比,本文提出的融合边缘采样和Tri-training的算法在分类准确率和效率上具有一定的优越性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
江亿平
张婷
夏争鸣
李玉花
张兆同
关键词 用户评论情感分析边缘采样Tri-training    
Abstract

[Objective] This paper proposes a sentiment analysis method for user reviews integrating margin sampling and tri-training. It addresses the issues of the large volume of user reviews, ambiguous sentiment tendencies, and short content. [Methods] First, we constructed a multi-class support vector machine based on a one-vs-all decomposition strategy. Then, we integrated a margin sampling strategy considering cosine similarity to create an initial set. Finally, we proposed a Tri-training algorithm combining a soft voting mechanism. [Results] The proposed algorithm improved the voting mechanism in the Tri-training algorithm, which further reduced the probability of misjudgment in sample classification by multiple classifiers. All categories achieved precision rates above 79%. [Limitations] The proposed method does not consider extracting information from multimedia data. [Conclusions] Compared with traditional and recently improved semi-supervised learning algorithms, the proposed algorithm demonstrates classification accuracy and efficiency superiority.

Key wordsUser Reviews    Sentiment Analysis    Margin Sampling    Tri-Training
收稿日期: 2023-05-31      出版日期: 2024-01-08
ZTFLH:  TP391  
  G350  
基金资助:*江苏省社会科学基金资助项目(21GLC003);教育部人文社会科学研究规划基金项目(22YJA630033);江苏省研究生科研与实践创新计划项目(SJCX23_0229)
通讯作者: 张兆同,ORCID:0000-0002-1155-8603,E-mail: zzt5576@njau.edu.cn。   
引用本文:   
江亿平, 张婷, 夏争鸣, 李玉花, 张兆同. 融合边缘采样和Tri-training的用户评论情感分析方法*[J]. 数据分析与知识发现, 2024, 8(5): 102-112.
Jiang Yiping, Zhang Ting, Xia Zhengming, Li Yuhua, Zhang Zhaotong. Sentiment Analysis of User Reviews Integrating Margin Sampling and Tri-training. Data Analysis and Knowledge Discovery, 2024, 8(5): 102-112.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2023.0519      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2024/V8/I5/102
Fig.1  融合边缘采样和Tri-training的用户评论情感分析框架
Fig.2  改进边缘采样
Fig.3  数据预处理流程
评论内容 分组 分词结果
看起来很新鲜,京东自营的生鲜质量很令人放心 5 看起来/很/新鲜,京东/自营/的/生鲜/质量/很/令人放心
大品牌鲜果,很新鲜,选这个牌子也是精挑细选了好久好久,送货速度快 5 大品牌/鲜果,很/新鲜,选/这个/牌子/也是/精挑细选/了/好久好久,送货/速度/快
纯进口鲜果口感特别香甜软糯,值得大家购买 5 纯/进口/鲜果/口感/特别/香甜/软糯,值得/大家/购买
…… …… ……
这次买的不好,坏的很快 1 这次/买/的/不好,坏/的/很快
70块钱买了一堆臭东西,最信任的平台 1 70/块钱/买/了/一堆/臭/东西,最/信任/的/平台
Table 1  在线评论分词与分组
Fig.4  主要产品属性评论数量
Fig.5  一对多拆解策略
Fig.6  一对多拆解过程演示
方法 Labeled5 Labeled10
情感类别 5 4 3 2 1 5 4 3 2 1
评论总数 704 711 703 695 687 709 691 714 700 686
Self-training 精确率(%) 63.3 65.5 71.4 73.3 70.0 66.1 70.7 66.1 72.9 74.8
召回率(%) 65.3 67.2 70.3 69.9 71.4 63.9 71.8 75.6 67.7 72.3
F1值(%) 64.3 66.3 70.8 71.6 70.7 65.0 71.2 70.5 70.2 73.5
Tri-training 精确率(%) 70.2 73.5 75.3 75.9 74.0 72.5 74.3 76.2 73.3 75.8
召回率(%) 72.7 77.0 80.9 81.2 75.8 76.5 69.9 79.9 74.9 73.1
F1值(%) 73.4 77.7 80.6 80.5 76.9 74.4 72.0 78.0 72.5 74.4
DW-TCI 精确率(%) 75.7 76.7 78.1 77.4 76.0 77.1 77.6 78.9 77.9 76.5
召回率(%) 74.7 77.7 81.4 81.8 76.9 77.8 73.4 79.2 74.3 74.5
F1值(%) 75.2 78.1 80.8 80.9 77.1 77.9 74.5 78.6 75.3 76.4
改进SVM 精确率(%) 76.1 77.3 78.5 77.9 76.5 77.4 78.4 79.2 78.1 78.8
召回率(%) 75.1 77.9 81.9 82.2 77.1 78.5 75.1 78.9 76.2 77.4
F1值(%) 76.7 78.5 81.2 81.5 78.3 78.6 75.7 79.2 77.5 78.1
IMS-Tri-training 精确率(%) 79.5 82.6 81.9 82.1 81.2 84.3 86.8 82.8 81.9 87.3
召回率(%) 84.1 80.2 84.4 83.0 82.8 87.3 82.2 79.0 79.1 84.9
F1值(%) 81.7 81.4 83.1 82.5 82.0 85.8 84.4 80.9 80.5 86.1
Table 2  5种半监督学习方法的分类效果
Fig.7  不同样本量下5种半监督学习方法的分类效果
Fig.8  不同样本量下半监督学习方法的分析时间
分布函数 平方误差和 AIC BIC KL散度
exponpow 0.073 24.710 -43 633.213 0.210
t 0.107 24.130 -42 111.546 0.289
norm 0.107 22.130 -42 119.752 0.289
lognorm 0.107 24.134 -42 108.580 0.289
cauchy 0.109 26.339 -42 040.730 0.314
Table 3  IMS-Tri-training分类数据拟合分布对比
Fig.9  不同分布对产品情感分类的拟合情况
[1] 中华人民共和国国家互联网信息办公室. “十四五”国家信息化规划[EB/OL]. [2021-12-27]. http://www.cac.gov.cn/2021-12/27/c_1642205314518676.htm.
[1] (Cyberspace Administration of China. The “14th Five-Year” National Informatization Planning[EB/OL]. [2021-12-27]. http://www.cac.gov.cn/2021-12/27/c_1642205314518676.htm.)
[2] 周建, 刘炎宝, 刘佳佳. 情感分析研究的知识结构及热点前沿探析[J]. 情报学报, 2020, 39(1): 111-124.
[2] (Zhou Jian, Liu Yanbao, Liu Jiajia. Exploration of Intellectual Structure and Hot Issues in Sentiment Analysis Research[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(1): 111-124.)
[3] 刘彤, 刘琛, 倪维健. 多层次数据增强的半监督中文情感分析方法[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[3] (Liu Tong, Liu Chen, Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 51-58.)
[4] 李磊, 宋建伟, 刘继. 基于在线评论情感分析的声誉影响效应研究[J]. 管理学报, 2020, 17(4): 583-591.
[4] (Li Lei, Song Jianwei, Liu Ji. Analyzing the Effect of Reputation Based on Sentiment Analysis of Online Comment Texts[J]. Chinese Journal of Management, 2020, 17(4): 583-591.)
[5] 马凤才, 李春月. 消费者对电子商务平台销售生鲜产品满意度测算研究——基于京东生鲜在线评论的分析[J]. 价格理论与实践, 2020(5): 117-120.
[5] (Ma Fengcai, Li Chunyue. Research on E-Commerce Consumer Satisfaction Measurement of Fresh Products—Analysis Based on Online Reviews of JD Fresh[J]. Price: Theory & Practice, 2020(5): 117-120.)
[6] 刘玉林, 菅利荣. 基于文本情感分析的电商在线评论数据挖掘[J]. 统计与信息论坛, 2018, 33(12): 119-124.
[6] (Liu Yulin, Jian Lirong. Data Mining of E-Commerce Online Reviews Based on Sentiment Analysis[J]. Statistics & Information Forum, 2018, 33(12): 119-124.)
[7] 卢伟聪, 徐健. 基于二分网络的网络用户评论情感分析[J]. 情报理论与实践, 2018, 41(2): 121-126.
[7] (Lu Weicong, Xu Jian. Sentiment Analysis of Network Users’ Reviews Based on Bipartite Network[J]. Information Studies: Theory & Application, 2018, 41(2): 121-126.)
[8] Chang C H, Hwang S Y, Wu M L. Learning Bilingual Sentiment Lexicon for Online Reviews[J]. Electronic Commerce Research and Applications, 2021, 47: Article No.101037.
[9] Zhang J, Lu X C, Liu D. Deriving Customer Preferences for Hotels Based on Aspect-Level Sentiment Analysis of Online Reviews[J]. Electronic Commerce Research and Applications, 2021, 49: Article No.101094.
[10] Li H, Chen Q, Zhong Z, et al. E-Word of Mouth Sentiment Analysis for User Behavior Studies[J]. Information Processing and Management, 2022, 59(1): Article No.102784.
[11] 包乾辉, 李佳利, 石淑珍, 等. 基于DSLML的鸡蛋消费在线评论情感分析[J]. 农业机械学报, 2021, 52(S1): 496-503.
[11] (Bao Qianhui, Li Jiali, Shi Shuzhen, et al. Sentimental Analysis of Online Reviews of Egg Consumption Based on DSLML[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52(S1): 496-503.)
[12] 朱晓霞, 宋嘉欣, 孟建芳. 基于主题—情感挖掘模型的微博评论情感分类研究[J]. 情报理论与实践, 2019, 42(5): 159-164.
[12] (Zhu Xiaoxia, Song Jiaxin, Meng Jianfang. Research on the Classification of Emotion in Microblog Comments Based on the Theme-Emotion Mining Model[J]. Information Studies: Theory & Application, 2019, 42(5): 159-164.)
[13] Luo J M, Vu H Q, Li G, et al. Understanding Service Attributes of Robot Hotels: A Sentiment Analysis of Customer Online Reviews[J]. International Journal of Hospitality Management, 2021, 98: Article No.103032.
[14] 李浩君, 吕韵, 汪旭辉, 等. 融入情感分析的多层交互深度推荐模型研究[J]. 数据分析与知识发现, 2023, 7(3): 43-57.
[14] (Li Haojun, Lv Yun, Wang Xuhui, et al. A Deep Recommendation Model with Multi-Layer Interaction and Sentiment Analysis[J]. Data Analysis and Knowledge Discovery, 2023, 7(3): 43-57.)
[15] Lin H C K, Wang T H, Lin G C, et al. Applying Sentiment Analysis to Automatically Classify Consumer Comments Concerning Marketing 4Cs Aspects[J]. Applied Soft Computing, 2020, 97: Article No.106755.
[16] Zhang J, Zhang A J, Liu D, et al. Customer Preferences Extraction for Air Purifiers Based on Fine-Grained Sentiment Analysis of Online Reviews[J]. Knowledge-Based Systems, 2021, 228: Article No.107259.
[17] Wang W, Guo L H, Wu Y J. The Merits of a Sentiment Analysis of Antecedent Comments for the Prediction of Online Fundraising Outcomes[J]. Technological Forecasting & Social Change, 2022, 174: Article No.121070.
[18] Wang P, Li J N, Hou J R. S2SAN: A Sentence-to-Sentence Attention Network for Sentiment Analysis of Online Reviews[J]. Decision Support Systems, 2021, 149: Article No.113603.
[19] Zhang K, Zhu Y W, Zhang W J, et al. Cross-Modal Image Sentiment Analysis via Deep Correlation of Textual Semantic[J]. Knowledge-Based Systems, 2021, 216: Article No.106803.
[20] 周宁, 钟娜, 靳高雅, 等. 基于混合词嵌入的双通道注意力网络中文文本情感分析[J]. 数据分析与知识发现, 2023, 7(3): 58-68.
[20] (Zhou Ning, Zhong Na, Jin Gaoya, et al. Chinese Text Sentiment Analysis Based on Dual Channel Attention Network with Hybrid Word Embedding[J]. Data Analysis and Knowledge Discovery, 2023, 7(3): 58-68.)
[21] 刘逸, 孟令坤, 保继刚, 等. 人工计算模型与机器学习模型的情感捕捉效度比较研究——以旅游评论数据为例[J]. 南开管理评论, 2021, 24(5): 63-74.
[21] (Liu Yi, Meng Lingkun, Bao Jigang, et al. A Comparative Study of Sentiment Computing Methods: Will Machine Learning be Overwhelming?[J]. Nankai Business Review, 2021, 24(5): 63-74.)
[22] Zhao H L, Liu Z H, Yao X M, et al. A Machine Learning-Based Sentiment Analysis of Online Product Reviews with a Novel Term Weighting and Feature Selection Approach[J]. Information Processing & Management, 2021, 58(5): Article No.102656.
[23] Liu Y, Lu J H, Yang J, et al. Sentiment Analysis for E-Commerce Product Reviews by Deep Learning Model of Bert-BiGRU-Softmax[J]. Mathematical Biosciences and Engineering, 2020, 17(6): 7819-7837.
doi: 10.3934/mbe.2020398 pmid: 33378922
[24] 史达, 王乐乐, 衣博文. 在线评论有用性的深度数据挖掘——基于TripAdvisor的酒店评论数据[J]. 南开管理评论, 2020, 23(5): 64-75.
[24] (Shi Da, Wang Lele, Yi Bowen. Deep Data Mining for Online Reviews Usefulness: Hotel Reviews Data on TripAdvisor[J]. Nankai Business Review, 2020, 23(5): 64-75.)
[25] 闫尚义, 王靖亚, 刘晓文, 等. 基于多头自注意力池化与多粒度特征交互融合的微博情感分析[J]. 数据分析与知识发现, 2023, 7(4): 32-45.
[25] (Yan Shangyi, Wang Jingya, Liu Xiaowen, et al. Microblog Sentiment Analysis with Multi-Head Self-Attention Pooling and Multi-Granularity Feature Interaction Fusion[J]. Data Analysis and Knowledge Discovery, 2023, 7(4): 32-45.)
[26] Lin X, Ho C, Xia L, et al. Sentiment Analysis of Low-Carbon Travel APP User Comments Based on Deep Learning[J]. Sustainable Energy Technologies and Assessments, 2021, 44: Article No.101014.
[27] Bigne E, Ruiz C, Cuenca A, et al. What Drives the Helpfulness of Online Reviews? A Deep Learning Study of Sentiment Analysis, Pictorial Content and Reviewer Expertise for Mature Destinations[J]. Journal of Destination Marketing & Management, 2021, 20: Article No.100570.
[28] 庞庆华, 董显蔚, 周斌, 等. 基于情感分析与TextRank的负面在线评论关键词抽取[J]. 情报科学, 2022, 40(5): 111-117.
[28] (Pang Qinghua, Dong Xianwei, Zhou Bin, et al. Keyword Extraction of Negative Online Reviews Based on Sentiment Analysis[J]. Information Science, 2022, 40(5): 111-117.)
[29] Zhu J J, Chang Y C, Ku C H, et al. Online Critical Review Classification in Response Strategy and Service Provider Rating: Algorithms from Heuristic Processing, Sentiment Analysis to Deep Learning[J]. Journal of Business Research, 2021, 129: 860-877.
[30] 刘洋, 马莉莉, 张雯, 等. 基于跨模态深度学习的旅游评论反讽识别[J]. 数据分析与知识发现, 2022, 6(12): 23-31.
[30] (Liu Yang, Ma Lili, Zhang Wen, et al. Detecting Sarcasm from Travel Reviews Based on Cross-Modal Deep Learning[J]. Data Analysis and Knowledge Discovery, 2022, 6(12): 23-31.)
[31] 张振刚, 罗泰晔. 基于在线评论数据挖掘和Kano模型的产品需求分析[J]. 管理评论, 2022, 34(11): 109-117.
[31] (Zhang Zhengang, Luo Taiye. Product Demand Analysis Based on Online Review Data Mining and Kano Model[J]. Management Review, 2022, 34(11): 109-117.)
[32] 周瑛, 张晓宇, 虞小芳. 基于产品评论挖掘的消费者偏好分析[J]. 情报科学, 2022, 40(1): 58-65.
[32] (Zhou Ying, Zhang Xiaoyu, Yu Xiaofang. User Preference Analysis Based on Product Review Mining[J]. Information Science, 2022, 40(1): 58-65.)
[33] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 63-70.
[33] (Zhou Zhihua. Machine Learning[M]. Beijing: Tsinghua University Press, 2016: 63-70.)
[34] 李航. 机器学习方法[M]. 北京: 清华大学出版社, 2022: 3-27.
[34] (Li Hang. Machine Learning Method[M]. Beijing: Tsinghua University Press, 2022: 3-27.)
[35] Scudder H. Probability of Error of Some Adaptive Pattern-Recognition Machines[J]. IEEE Transactions on Information Theory, 1965, 11(3): 363-371.
[36] Zhou Z H, Li M. Tri-Training: Exploiting Unlabeled Data Using Three Classifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541.
[37] 余本功, 汲浩敏. 基于DW-TCI的半监督文本分类方法研究[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
[37] (Yu Bengong, Ji Haomin. Semi-Supervised Method for Text Classification Based on DW-TCI[J]. Data Analysis and Knowledge Discovery, 2020, 4(10): 58-69.)
[38] 许敏. 隐空间特征增强自标记半监督SVM分类新方法[J]. 统计与决策, 2022, 38(7): 11-15.
[38] (Xu Min. A New Method of Hidden Space Feature Augmentation for Self-Labeled Semi-Supervised SVM Classification[J]. Statistics & Decision, 2022, 38(7): 11-15.)
[1] 吕学强, 田驰, 张乐, 杜一凡, 张旭, 才藏太. 融合多特征和注意力机制的多模态情感分析模型*[J]. 数据分析与知识发现, 2024, 8(5): 91-101.
[2] 张伟, 徐宗煌, 蔡鸿宇, 韩普, 石进. 基于情感增强和知识融合的在线健康社区情感分析研究*[J]. 数据分析与知识发现, 2024, 8(3): 53-62.
[3] 李慧, 胡耀华, 徐存真. 考虑评论情感表达力及其重要性的个性化推荐算法*[J]. 数据分析与知识发现, 2024, 8(1): 69-79.
[4] 李雪莲, 王碧, 李立鑫, 韩迪轩. 融合抽象语义表示和依存语法的方面级情感分析*[J]. 数据分析与知识发现, 2024, 8(1): 55-68.
[5] 吴江, 李秋贝, 胡忠义, 刘洋. 基于IPA模型的乡村旅游景区游客满意度分析*[J]. 数据分析与知识发现, 2023, 7(7): 89-99.
[6] 闫尚义, 王靖亚, 刘晓文, 崔雨萌, 陶知众, 张晓帆. 基于多头自注意力池化与多粒度特征交互融合的微博情感分析*[J]. 数据分析与知识发现, 2023, 7(4): 32-45.
[7] 张昱, 张海军, 刘雅情, 梁科晋, 王月阳. 基于双向掩码注意力机制的多模态情感分析*[J]. 数据分析与知识发现, 2023, 7(4): 46-55.
[8] 李浩君, 吕韵, 汪旭辉, 黄诘雅. 融入情感分析的多层交互深度推荐模型研究*[J]. 数据分析与知识发现, 2023, 7(3): 43-57.
[9] 周宁, 钟娜, 靳高雅, 刘斌. 基于混合词嵌入的双通道注意力网络中文文本情感分析*[J]. 数据分析与知识发现, 2023, 7(3): 58-68.
[10] 王昊, 龚丽娟, 周泽聿, 范涛, 王永生. 融合语义增强的社交媒体虚假信息检测方法研究*[J]. 数据分析与知识发现, 2023, 7(2): 48-60.
[11] 李合龙, 任昌松, 柳欣茹, 汪存华. 金融市场文本情绪研究综述*[J]. 数据分析与知识发现, 2023, 7(12): 22-39.
[12] 操玮, 廖臣悦, 张福伟. 跨市场跨来源情感分析驱动的人民币汇率预测研究*[J]. 数据分析与知识发现, 2023, 7(12): 75-87.
[13] 吴旭旭, 陈鹏, 江欢. 基于多特征融合的微博细粒度情感分析*[J]. 数据分析与知识发现, 2023, 7(12): 102-113.
[14] 赖宇斌, 陈燕, 胡小春, 黄欣. 基于提示嵌入的突发公共卫生事件微博文本情感分析*[J]. 数据分析与知识发现, 2023, 7(11): 46-55.
[15] 林哲, 陈平华. 基于块注意力机制和Involution的文本情感分析模型*[J]. 数据分析与知识发现, 2023, 7(11): 37-45.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn