Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (3): 46-53    DOI: 10.11925/infotech.2096-3467.2017.03.06
Orginal Article Current Issue | Archive | Adv Search |
Sentiment Analysis of Trending Topics Based on Relevance
He Yue, Xiao Min(), Zhang Yue
Business School, Sichuan University, Chengdu 610064, China
Download: PDF (641 KB)   HTML ( 20
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to effectively analyze the sentiment of trending topics with machine learning techniques. [Methods] First, we proposed a new classification model based on trending topic relevance to extract subjective microblog posts. Second, we analyzed sentiment tendency with an improved machine learning method. [Results] We found that the modified model improved the subjective-objective classification of trending topics. The F-measures were increased by 7.4% and 2.2% respectively. [Limitations] More research is needed to study the distribution of data, the particle of emotion and the changes of sentiment trends. [Conclusions] Adding topic relevance factor to the model could improve the performance of sentiment analysis of micro-blog posts, and extract tendency of key objects from the trending topics, which provides intelligence for micro-blog marketing.

Key wordsTrending Topic      Subjective-Objective Classification      Emotion Orientation Classification      TF-IDF-SIM      Machine Learning     
Received: 17 October 2016      Published: 20 April 2017
ZTFLH:  G350  

Cite this article:

He Yue,Xiao Min,Zhang Yue. Sentiment Analysis of Trending Topics Based on Relevance. Data Analysis and Knowledge Discovery, 2017, 1(3): 46-53.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.03.06     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I3/46

对比项 特征 取值
常用的五
维分类
特征
是否含有情感词 0, 1
是否含有感叹号 0, 1
是否含有问号 0, 1
是否含有主张词 0, 1
是否含有程度副词 0, 1
张想[19]
入的三维
新特征
是否含有代词或名词 0, 1
微博句子数目 Real
微博所含词的个数 Real
特征类型 特征内容 描述 特征取值
表情符号 情感表情
符号个数
新浪微博默认表情类 Real
情感词 情感词出
现个数
HowNet情感分析用词语集 Real
网络用语 网络用语词
出现个数
人工收集的网络用语词典,
含褒义词和贬义词
Real
否定词 是否出现
否定词
是否情感词前面存在否定词(否定词23个, 来源是HowNet词典) 0, 1
程度副词 是否含有
程度副词
HowNet词典中的程度词词典 0, 1
语气词 是否含有
语气词
“呀”、“啦”、“呢”、“吧”、
“啊”等25个语气词
0, 1
特征类型 特征内容 描述 特征
取值
正面表情
符号
正面表情
符号个数
新浪微博默
认表情类
Real
负面表情
符号
负面表情
符号个数
新浪微博默
认表情类
Real
正面情感词 正面情感词
个数
HowNet中的
正面情感词
Real
负面情感词 负面情感词
个数
HowNet中的
负面情感词
Real
正面网络
用语
正面网络用语
词个数
褒义的网络
用语词典
Real
负面网络
用语
负面网络用语
词个数
贬义的网络
用语词典
Real
否定词 是否出现
否定词
是否情感词前面
存在否定词(情感
词前3个词之内)
0, 1
程度副词 是否含有
程度副词
HowNet词典中的
程度词词典
0, 1
语气词 是否含有
语气词
“呀”、“啦”、“呢”、
“吧”、“啊”等25个
0, 1
转折词 是否含有
转折词
“但是”、“可是”、“然
而”等7个常用词
0, 1
情感极性 主观且相关 其余
正面 负面 主观且无关 客观且相关 客观且无关
合计 38 022 24 598 10 596 11 071 4 284
62 620 10 596 15 355
对比项 数量/条 准确率
(%)
召回率(%) F值(%)
SVM分类 主观文本 话题相关 53 356 82.5 89.3 85.8
话题无关 10 127 76.7 93.3 84.2
客观文本 话题相关 15 365 68.5 73.8 71.1
话题无关 9 723 53.9 55.6 54.7
Logistic回归 话题相关
且主观文本
53 285 83.6 89.0 86.2
对比项 主观(%) 客观(%) 总体(%)
准确率 召回率 F值 准确率 召回率 F值 F值
未加话题相关性分类子模型 76.8 94.1 84.6 66.6 42.1 51.6 72.3
加入话题相关性分类子模型 88.2 92.3 90.2 81.5 53.8 66.8 79.7
对比项 情感倾向 数量(条) 准确率(%) 召回率(%) F值(%)
改进前 正面倾向 34 479 80.5 87.6 83.9
负面倾向 18 806 73.2 79.1 76.0
改进后 正面倾向 33 941 84.3 90.3 87.2
负面倾向 19 344 79.8 77.6 78.7
对比项 正面(%) 负面(%) 总体
准确率 召回率 F值 准确率 召回率 F值 F值
未加结合话题相关性的主客观分类模型 81.5 92 86.4 67.9 82.8 74.6 81.7
加入结合话题相关性的主客观分类模型 84.3 90.3 87.2 79.8 77.6 78.7 83.9
对比项 Hashtag 冯小刚 私人订制 小故事 葛优
正面情感数量(条) 13 526 7 158 2 330 417 532
负面情感数量(条) 7 945 4 415 3 052 508 365
[1] 陈国兰. 基于情感词典与语义规则的微博情感分析[J]. 情报探索, 2016(2): 1-6.
doi: 10.3969/j.issn.1005-8095.2016.02.001
[1] (Chen Guolan.Microbiog Sentiment Analysis Basing on Emotion Dictionary and Semantic Rule[J]. Information Research, 2016(2): 1-6.)
doi: 10.3969/j.issn.1005-8095.2016.02.001
[2] 桂斌, 杨小平, 张中夏, 等. 基于微博表情符号的情感词典构建研究[J]. 北京理工大学学报, 2014, 34(5): 537-541.
[2] (Gui Bin, Yang Xiaoping, Zhang Zhongxia, et al.Research on Building Lexicon for Sentiment Analysis Based on the Chinese Microblogging Smiley[J]. Transactions of Beijing Institute of Technology, 2014, 34(5): 537-541.)
[3] Bravo-Marquez F, Frank E, Pfahringer B.Building a Twitter Opinion Lexicon from Automatically-annotated Tweets[J]. Knowledge-Based Systems, 2016, 108(SI). DOI: 10.1016/j.knosys.2016.05.018.
doi: 10.1016/j.knosys.2016.05.018
[4] 宁慧, 杨松, 赵勇, 等. 基于语义特征的微博情感分析研究[J]. 应用科技, 2016, 43(3): 70-74.
doi: 10.11991/yykj.201506036
[4] (Ning Hui, Yang Song, Zhao Yong, et al.Study of Microblog Sentiment Analysis Based on Semantic Feature[J]. Applied Science and Technology, 2016, 43(3): 70-74.)
doi: 10.11991/yykj.201506036
[5] Zhou Z, Zhang X, Sanderson M.Sentiment Analysis on Twitter Through Topic-Based Lexicon Expansion[A]// Databases Theory and Applications[M]. Springer International Publishing, 2014:98-109.
[6] Saif H, Fernandez M, He Y, et al.SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twitter[A]// The Semantic Web: Trends and Challenges[M]. Springer, Cham, 2014: 83-98.
[7] Saif H, He Y, Fernandez M, et al.Adapting Sentiment Lexicons Using Contextual Semantics for Sentiment Analysis of Twitter[A]// The Semantic Web: ESWC 2014 Satellite Events[M]. Springer, Cham, 2014: 54-63.
[8] Saif H, He Y, Fernandez M, et al.Contextual Semantics for Sentiment Analysis of Twitter[J]. Information Processing & Management, 2015, 52(1): 5-19.
doi: 10.1016/j.ipm.2015.01.005
[9] Saif H, Fernandez M, Kastler L, et al.A Linked Open Data Approach for Sentiment Lexicon Adaptation[C]// Proceedings of the 15th International Semantic Web Conference. 2016.
[10] Zhao J, Cao X.Combining Semantic and Prior Polarity for Boosting Twitter Sentiment Analysis[C]//Proceedings of the 2015 IEEE International Conference on Smart City/ Socialcom/Sustaincom. IEEE, 2015:832-837.
[11] Le B, Nguyen H.Twitter Sentiment Analysis Using Machine Learning Techniques[A]// Advanced Computational Methods for Knowledge Engineering [M]. Springer International Publishing, 2015: 279-289.
[12] Qasem M, Thulasiram R, Thulasiram P.Twitter Sentiment Classification Using Machine Learning Techniques for Stock Markets[C]//Proceedings of the 2015 International Conference on Advances in Computing, Communications and Informatics. IEEE, 2015.
[13] Palguna D, Joshi V, Chakaravarthy V, et al.Analysis of Sampling Algorithms for Twitter[C]// Proceedings of the 24th International Joint Conference on Artificial Intelligence. AAAI Press, 2015.
[14] Song K, Feng S, Gao W, et al.Personalized Sentiment Classification Based on Latent Individuality of Microblog Users[C]// Proceedings of the 24th International Joint Conference on Artificial Intelligence. AAAI Press, 2015.
[15] Abdelwahab O, Bahgat M, Lowrance C J, et al.Effect of Training Set Size on SVM and Naive Bayes for Twitter Sentiment Analysis[C]// Proceedings of the IEEE International Symposium on Signal Processing and Information Technology. 2015: 46-51.
[16] Saif H, He Y, Alani H, et al.On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter[C]// Proceedings of the 9th International Conference on Language Resources and Evaluation. 2014.
[17] Ah-Pine J, Morales E P S. A Study of Synthetic Oversampling for Twitter Imbalanced Sentiment Analysis[C]// Proceedings of the Workshop on Interactions Between Data Mining and Natural Language Processing. 2016.
[18] Sabariah M K, Effendy V.Sentiment Analysis on Twitter Using the Combination of Lexicon-based and Support Vector Machine for Assessing the Performance of a Television Program[C]//Proceedings of the International Conference on Information and Communication Technology. 2015.
[19] 张想. 面向热点话题型微博的情感分析研究[D]. 哈尔滨: 哈尔滨工业大学, 2013.
[19] (Zhang Xiang.Research on Sentiment Analysis for Hot Topic Microblog[D]. Harbin: Harbin Institute of Technology, 2013.)
[20] 吴青林, 王焱. 中文微博情感特征选择方法研究[J]. 内蒙古师大学报: 自然汉文版, 2016, 45(1): 84-88.
doi: 10.3969/j.issn.1671-5896.2010.06.011
[20] (Wu Qinglin, Wang Yan.Research on the Emotional Feature Selection Method in the Chinese Microblog[J]. Journal of Inner Mongolia Normal University: Natural Science Edition, 2016, 45(1): 84-88.)
doi: 10.3969/j.issn.1671-5896.2010.06.011
[20] 田久乐, 赵蔚. 基于同义词词林的词语相似度计算方法[J]. 吉林大学学报: 信息科学版, 2010, 28(6): 602-608.
doi: 10.3969/j.issn.1671-5896.2010.06.011
[20] (Tian Jiule, Zhao Wei.Words Similarity Algorithm Based on Tongyici Cilin in Semantic Web Adaptive Learning System[J]. Journal of Jilin University: Information Science Edition, 2010, 28(6): 602-608.)
doi: 10.3969/j.issn.1671-5896.2010.06.011
[1] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[5] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[6] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[8] Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[9] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[10] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[11] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[12] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[13] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[14] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[15] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn