|
|
Sentiment Analysis of User Reviews Integrating Margin Sampling and Tri-training |
Jiang Yiping1,Zhang Ting1,Xia Zhengming1,Li Yuhua2,Zhang Zhaotong1() |
1College of Information Management, Nanjing Agricultural University, Nanjing 210031, China 2College of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210031, China |
|
|
Abstract [Objective] This paper proposes a sentiment analysis method for user reviews integrating margin sampling and tri-training. It addresses the issues of the large volume of user reviews, ambiguous sentiment tendencies, and short content. [Methods] First, we constructed a multi-class support vector machine based on a one-vs-all decomposition strategy. Then, we integrated a margin sampling strategy considering cosine similarity to create an initial set. Finally, we proposed a Tri-training algorithm combining a soft voting mechanism. [Results] The proposed algorithm improved the voting mechanism in the Tri-training algorithm, which further reduced the probability of misjudgment in sample classification by multiple classifiers. All categories achieved precision rates above 79%. [Limitations] The proposed method does not consider extracting information from multimedia data. [Conclusions] Compared with traditional and recently improved semi-supervised learning algorithms, the proposed algorithm demonstrates classification accuracy and efficiency superiority.
|
Received: 31 May 2023
Published: 08 January 2024
|
|
Fund:Social Science Foundation of Jiangsu Province(21GLC003);Humanity and Social Science Project of Ministry of Education of China(22YJA630033);Postgraduate Research & Practice Innovation Program of Jiangsu Province(SJCX23_0229) |
Corresponding Authors:
Zhang Zhaotong,ORCID:0000-0002-1155-8603,E-mail: zzt5576@njau.edu.cn。
|
[1] |
中华人民共和国国家互联网信息办公室. “十四五”国家信息化规划[EB/OL]. [2021-12-27]. http://www.cac.gov.cn/2021-12/27/c_1642205314518676.htm.
|
[1] |
(Cyberspace Administration of China. The “14th Five-Year” National Informatization Planning[EB/OL]. [2021-12-27]. http://www.cac.gov.cn/2021-12/27/c_1642205314518676.htm.)
|
[2] |
周建, 刘炎宝, 刘佳佳. 情感分析研究的知识结构及热点前沿探析[J]. 情报学报, 2020, 39(1): 111-124.
|
[2] |
(Zhou Jian, Liu Yanbao, Liu Jiajia. Exploration of Intellectual Structure and Hot Issues in Sentiment Analysis Research[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(1): 111-124.)
|
[3] |
刘彤, 刘琛, 倪维健. 多层次数据增强的半监督中文情感分析方法[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
|
[3] |
(Liu Tong, Liu Chen, Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 51-58.)
|
[4] |
李磊, 宋建伟, 刘继. 基于在线评论情感分析的声誉影响效应研究[J]. 管理学报, 2020, 17(4): 583-591.
|
[4] |
(Li Lei, Song Jianwei, Liu Ji. Analyzing the Effect of Reputation Based on Sentiment Analysis of Online Comment Texts[J]. Chinese Journal of Management, 2020, 17(4): 583-591.)
|
[5] |
马凤才, 李春月. 消费者对电子商务平台销售生鲜产品满意度测算研究——基于京东生鲜在线评论的分析[J]. 价格理论与实践, 2020(5): 117-120.
|
[5] |
(Ma Fengcai, Li Chunyue. Research on E-Commerce Consumer Satisfaction Measurement of Fresh Products—Analysis Based on Online Reviews of JD Fresh[J]. Price: Theory & Practice, 2020(5): 117-120.)
|
[6] |
刘玉林, 菅利荣. 基于文本情感分析的电商在线评论数据挖掘[J]. 统计与信息论坛, 2018, 33(12): 119-124.
|
[6] |
(Liu Yulin, Jian Lirong. Data Mining of E-Commerce Online Reviews Based on Sentiment Analysis[J]. Statistics & Information Forum, 2018, 33(12): 119-124.)
|
[7] |
卢伟聪, 徐健. 基于二分网络的网络用户评论情感分析[J]. 情报理论与实践, 2018, 41(2): 121-126.
|
[7] |
(Lu Weicong, Xu Jian. Sentiment Analysis of Network Users’ Reviews Based on Bipartite Network[J]. Information Studies: Theory & Application, 2018, 41(2): 121-126.)
|
[8] |
Chang C H, Hwang S Y, Wu M L. Learning Bilingual Sentiment Lexicon for Online Reviews[J]. Electronic Commerce Research and Applications, 2021, 47: Article No.101037.
|
[9] |
Zhang J, Lu X C, Liu D. Deriving Customer Preferences for Hotels Based on Aspect-Level Sentiment Analysis of Online Reviews[J]. Electronic Commerce Research and Applications, 2021, 49: Article No.101094.
|
[10] |
Li H, Chen Q, Zhong Z, et al. E-Word of Mouth Sentiment Analysis for User Behavior Studies[J]. Information Processing and Management, 2022, 59(1): Article No.102784.
|
[11] |
包乾辉, 李佳利, 石淑珍, 等. 基于DSLML的鸡蛋消费在线评论情感分析[J]. 农业机械学报, 2021, 52(S1): 496-503.
|
[11] |
(Bao Qianhui, Li Jiali, Shi Shuzhen, et al. Sentimental Analysis of Online Reviews of Egg Consumption Based on DSLML[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52(S1): 496-503.)
|
[12] |
朱晓霞, 宋嘉欣, 孟建芳. 基于主题—情感挖掘模型的微博评论情感分类研究[J]. 情报理论与实践, 2019, 42(5): 159-164.
|
[12] |
(Zhu Xiaoxia, Song Jiaxin, Meng Jianfang. Research on the Classification of Emotion in Microblog Comments Based on the Theme-Emotion Mining Model[J]. Information Studies: Theory & Application, 2019, 42(5): 159-164.)
|
[13] |
Luo J M, Vu H Q, Li G, et al. Understanding Service Attributes of Robot Hotels: A Sentiment Analysis of Customer Online Reviews[J]. International Journal of Hospitality Management, 2021, 98: Article No.103032.
|
[14] |
李浩君, 吕韵, 汪旭辉, 等. 融入情感分析的多层交互深度推荐模型研究[J]. 数据分析与知识发现, 2023, 7(3): 43-57.
|
[14] |
(Li Haojun, Lv Yun, Wang Xuhui, et al. A Deep Recommendation Model with Multi-Layer Interaction and Sentiment Analysis[J]. Data Analysis and Knowledge Discovery, 2023, 7(3): 43-57.)
|
[15] |
Lin H C K, Wang T H, Lin G C, et al. Applying Sentiment Analysis to Automatically Classify Consumer Comments Concerning Marketing 4Cs Aspects[J]. Applied Soft Computing, 2020, 97: Article No.106755.
|
[16] |
Zhang J, Zhang A J, Liu D, et al. Customer Preferences Extraction for Air Purifiers Based on Fine-Grained Sentiment Analysis of Online Reviews[J]. Knowledge-Based Systems, 2021, 228: Article No.107259.
|
[17] |
Wang W, Guo L H, Wu Y J. The Merits of a Sentiment Analysis of Antecedent Comments for the Prediction of Online Fundraising Outcomes[J]. Technological Forecasting & Social Change, 2022, 174: Article No.121070.
|
[18] |
Wang P, Li J N, Hou J R. S2SAN: A Sentence-to-Sentence Attention Network for Sentiment Analysis of Online Reviews[J]. Decision Support Systems, 2021, 149: Article No.113603.
|
[19] |
Zhang K, Zhu Y W, Zhang W J, et al. Cross-Modal Image Sentiment Analysis via Deep Correlation of Textual Semantic[J]. Knowledge-Based Systems, 2021, 216: Article No.106803.
|
[20] |
周宁, 钟娜, 靳高雅, 等. 基于混合词嵌入的双通道注意力网络中文文本情感分析[J]. 数据分析与知识发现, 2023, 7(3): 58-68.
|
[20] |
(Zhou Ning, Zhong Na, Jin Gaoya, et al. Chinese Text Sentiment Analysis Based on Dual Channel Attention Network with Hybrid Word Embedding[J]. Data Analysis and Knowledge Discovery, 2023, 7(3): 58-68.)
|
[21] |
刘逸, 孟令坤, 保继刚, 等. 人工计算模型与机器学习模型的情感捕捉效度比较研究——以旅游评论数据为例[J]. 南开管理评论, 2021, 24(5): 63-74.
|
[21] |
(Liu Yi, Meng Lingkun, Bao Jigang, et al. A Comparative Study of Sentiment Computing Methods: Will Machine Learning be Overwhelming?[J]. Nankai Business Review, 2021, 24(5): 63-74.)
|
[22] |
Zhao H L, Liu Z H, Yao X M, et al. A Machine Learning-Based Sentiment Analysis of Online Product Reviews with a Novel Term Weighting and Feature Selection Approach[J]. Information Processing & Management, 2021, 58(5): Article No.102656.
|
[23] |
Liu Y, Lu J H, Yang J, et al. Sentiment Analysis for E-Commerce Product Reviews by Deep Learning Model of Bert-BiGRU-Softmax[J]. Mathematical Biosciences and Engineering, 2020, 17(6): 7819-7837.
doi: 10.3934/mbe.2020398
pmid: 33378922
|
[24] |
史达, 王乐乐, 衣博文. 在线评论有用性的深度数据挖掘——基于TripAdvisor的酒店评论数据[J]. 南开管理评论, 2020, 23(5): 64-75.
|
[24] |
(Shi Da, Wang Lele, Yi Bowen. Deep Data Mining for Online Reviews Usefulness: Hotel Reviews Data on TripAdvisor[J]. Nankai Business Review, 2020, 23(5): 64-75.)
|
[25] |
闫尚义, 王靖亚, 刘晓文, 等. 基于多头自注意力池化与多粒度特征交互融合的微博情感分析[J]. 数据分析与知识发现, 2023, 7(4): 32-45.
|
[25] |
(Yan Shangyi, Wang Jingya, Liu Xiaowen, et al. Microblog Sentiment Analysis with Multi-Head Self-Attention Pooling and Multi-Granularity Feature Interaction Fusion[J]. Data Analysis and Knowledge Discovery, 2023, 7(4): 32-45.)
|
[26] |
Lin X, Ho C, Xia L, et al. Sentiment Analysis of Low-Carbon Travel APP User Comments Based on Deep Learning[J]. Sustainable Energy Technologies and Assessments, 2021, 44: Article No.101014.
|
[27] |
Bigne E, Ruiz C, Cuenca A, et al. What Drives the Helpfulness of Online Reviews? A Deep Learning Study of Sentiment Analysis, Pictorial Content and Reviewer Expertise for Mature Destinations[J]. Journal of Destination Marketing & Management, 2021, 20: Article No.100570.
|
[28] |
庞庆华, 董显蔚, 周斌, 等. 基于情感分析与TextRank的负面在线评论关键词抽取[J]. 情报科学, 2022, 40(5): 111-117.
|
[28] |
(Pang Qinghua, Dong Xianwei, Zhou Bin, et al. Keyword Extraction of Negative Online Reviews Based on Sentiment Analysis[J]. Information Science, 2022, 40(5): 111-117.)
|
[29] |
Zhu J J, Chang Y C, Ku C H, et al. Online Critical Review Classification in Response Strategy and Service Provider Rating: Algorithms from Heuristic Processing, Sentiment Analysis to Deep Learning[J]. Journal of Business Research, 2021, 129: 860-877.
|
[30] |
刘洋, 马莉莉, 张雯, 等. 基于跨模态深度学习的旅游评论反讽识别[J]. 数据分析与知识发现, 2022, 6(12): 23-31.
|
[30] |
(Liu Yang, Ma Lili, Zhang Wen, et al. Detecting Sarcasm from Travel Reviews Based on Cross-Modal Deep Learning[J]. Data Analysis and Knowledge Discovery, 2022, 6(12): 23-31.)
|
[31] |
张振刚, 罗泰晔. 基于在线评论数据挖掘和Kano模型的产品需求分析[J]. 管理评论, 2022, 34(11): 109-117.
|
[31] |
(Zhang Zhengang, Luo Taiye. Product Demand Analysis Based on Online Review Data Mining and Kano Model[J]. Management Review, 2022, 34(11): 109-117.)
|
[32] |
周瑛, 张晓宇, 虞小芳. 基于产品评论挖掘的消费者偏好分析[J]. 情报科学, 2022, 40(1): 58-65.
|
[32] |
(Zhou Ying, Zhang Xiaoyu, Yu Xiaofang. User Preference Analysis Based on Product Review Mining[J]. Information Science, 2022, 40(1): 58-65.)
|
[33] |
周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 63-70.
|
[33] |
(Zhou Zhihua. Machine Learning[M]. Beijing: Tsinghua University Press, 2016: 63-70.)
|
[34] |
李航. 机器学习方法[M]. 北京: 清华大学出版社, 2022: 3-27.
|
[34] |
(Li Hang. Machine Learning Method[M]. Beijing: Tsinghua University Press, 2022: 3-27.)
|
[35] |
Scudder H. Probability of Error of Some Adaptive Pattern-Recognition Machines[J]. IEEE Transactions on Information Theory, 1965, 11(3): 363-371.
|
[36] |
Zhou Z H, Li M. Tri-Training: Exploiting Unlabeled Data Using Three Classifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541.
|
[37] |
余本功, 汲浩敏. 基于DW-TCI的半监督文本分类方法研究[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
|
[37] |
(Yu Bengong, Ji Haomin. Semi-Supervised Method for Text Classification Based on DW-TCI[J]. Data Analysis and Knowledge Discovery, 2020, 4(10): 58-69.)
|
[38] |
许敏. 隐空间特征增强自标记半监督SVM分类新方法[J]. 统计与决策, 2022, 38(7): 11-15.
|
[38] |
(Xu Min. A New Method of Hidden Space Feature Augmentation for Self-Labeled Semi-Supervised SVM Classification[J]. Statistics & Decision, 2022, 38(7): 11-15.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|