Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (2): 43-49     https://doi.org/10.11925/infotech.2096-3467.2020.1059
  专题 本期目录 | 过刊浏览 | 高级检索 |
基于情感分析的微博热点话题用户群体划分模型 *
张梦瑶,朱广丽(),张顺香,张标
安徽理工大学计算机科学与工程学院 淮南 232001
Grouping Microblog Users of Trending Topics Based on Sentiment Analysis
Zhang Mengyao,Zhu Guangli(),Zhang Shunxiang,Zhang Biao
Computer Science and Engineering, Anhui University of Science & technology, Huainan 232001, China
全文: PDF (722 KB)   HTML ( 31
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 提出一种划分模型解决微博热点话题下用户群体分类问题。【方法】 从情感分析的角度入手,采用情感词典的方法计算用户文本情感值,并将文本情感值与用户文本向量表达相结合构建用户观点情感特征,利用K-means方法划分用户群体。【结果】 本文提出的话题下用户群体划分模型将用户分为三类,评价指标CA的值为78.2%。【局限】 该模型在划分用户群体时需要首先确定类别数。【结论】 根据研究结果可知本文构建模型和选取特征的有效性;同时,使用该模型划分的用户群体精度较高,能很好地将有相同情感观点的用户聚为一类。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张梦瑶
朱广丽
张顺香
张标
关键词 微博情感分析词典聚类用户群体划分    
Abstract

[Objective] The paper proposes a model to group users of Weibo trending topics. [Methods] First, we computed the sentiment of user’s texts with sentiment dictionary. Then, we combined sentiment and text vector expression to determine the characteristics of user opinion. Finally, we grouped similar users with the K-means method. [Results] The proposed model divided users into three categories, and the value of evaluation index (CA) reached 78.2%. [Limitations] Our model needs to define the number of categories before dividing user groups. [Conclusions] The proposed model could effectively group users with the same sentimental views.

Key wordsMicroblog    Sentiment Analysis    Dictionary    Clustering    User Group Classification
收稿日期: 2020-10-28      出版日期: 2020-12-15
ZTFLH:  TP393  
基金资助:*国家自然科学基金项目(62076006);安徽省自然科学基金项目(1908085MF189)
通讯作者: 朱广丽 ORCID:0000-0003-4364-866X     E-mail: glzhu@aust.edu.cn
引用本文:   
张梦瑶, 朱广丽, 张顺香, 张标. 基于情感分析的微博热点话题用户群体划分模型 *[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis. Data Analysis and Knowledge Discovery, 2021, 5(2): 43-49.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.1059      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I2/43
Fig.1  划分模型研究框架
词语 词频 词语 词频
李某 461 希望 102
贾某 328 举报 96
出轨 299 喜欢 94
离婚 288 网页 93
251 心疼 88
孩子 246 明星 84
恭喜 189 终于 81
某馨 188 评论 69
视频 167 家庭 68
哈哈哈 123 可怜 65
女人 117 好好 65
106 可惜 59
Table 1  部分特征词及词频
用户 观点情感特征
Since-孟孟孟子 0 0 0 0 0 0 0 0 0 0 0 4.77 0 0 0 5.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
类阿类- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
殷进媳妇儿 0 0 0 0 0 0 0 0 0 0 0 0 5.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
柑g子 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
勋鹿家的小仙女 0 0 0 0 0 0 0 0 0 0 0 4.77 0 0 0 5.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
阿桐baby 0 0 0 0 0 0 0 0 0 0 0 9.54 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
水瓶座佳佳大本营 0 0 0 0 0 0 0 0 0 0 0 3.18 0 0 0 3.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Table 2  部分观点情感特征
编号 用户序号 用户 评论
1 3 一个平凡人的吃瓜专用微博 既然不爱,分开吧,搞得鸡飞狗跳,看着都累。希望某馨健康成长。
6 8 起名太难不会改 各自安好吧,婚姻如饮水冷暖自知,作为局外人,没法说真的谁对谁错,好好对某馨,我喜欢你的女儿
13 15 就叫阿馨呀 好好对待孩子,别让孩子扯入大人的恩怨纠纷
137 139 傻fufu猪猪 你是真的绿茶,心痛亮哥和某馨
315 317 做不了你的洛璃但想见你 心疼亮哥和某馨
405 407 蓝忘机0818 主要还是心疼某馨和贾某
431 433 foxfoxy琳 心疼孩子
439 441 梦509 心疼某馨
609 611 如何为了自己奋斗一次 尊重对方,给孩子一个健康成长的环境。大人的错误不该让孩子来买单。
738 740 shan-yolo 某馨真可怜
1 121 1 123 长宁2004 就没有想过孩子吗
Table 3  部分第二类用户及其评论
编号 用户序号 用户 评论
18 20 Since-孟孟孟子 恭喜亮哥
39 41 勋鹿家的小仙女 恭喜亮哥
43 45 阿桐baby 恭喜
49 51 水瓶座佳佳大本营 恭喜亮哥脱离苦海
108 110 Rsskcs 恭喜亮哥
134 136 Sai平安喜乐 恭喜亮哥脱离苦海
162 164 再无感100 恭喜亮哥脱离苦海
170 172 小赞哥呀 恭喜贾某脱离苦海 喜得重生
172 174 H魔法师阿狸H 恭喜亮亮脱离苦海
191 193 禹棹奂女朋友 终于发声明了 恭喜亮哥
205 207 乱花渐欲迷人眼520 恭喜亮哥喜得单身
Table 4  部分第三类用户及其评论
[1] Giatsoglou M, Vozalis M G, Diamantaras K, et al. Sentiment Analysis Leveraging Emotions and Word Embeddings[J]. Expert Systems with Applications, 2017,69:214-224.
doi: 10.1016/j.eswa.2016.10.043
[2] Islam M Z, Liu J, Li J, et al. A Semantics Aware Random Forest for Text Classification[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China. 2019: 1061-1071.
[3] Wu F Z, Huang Y F, Song Y Q, et al. Towards Building a High-Quality Microblog-Specific Chinese Sentiment Lexicon[J]. Decision Support Systems, 2016,87:39-49.
doi: 10.1016/j.dss.2016.04.007
[4] Tubishat M, Idris N, Abushariah M A M. Implicit Aspect Extraction in Sentiment Analysis: Review, Taxonomy, Opportunities, and Open Challenges[J]. Information Processing & Management, 2018,54(4):545-563.
doi: 10.1016/j.ipm.2018.03.008
[5] Ikram M T, Afzal M T. Aspect Based Citation Sentiment Analysis Using Linguistic Patterns for Better Comprehension of Scientific Knowledge[J]. Scientometrics, 2019,119(1):73-95.
doi: 10.1007/s11192-019-03028-9
[6] 吕光瑞, 蔡国永, 林煜明. 基于多模态判别性嵌入空间的图像情感分析[J]. 北京邮电大学学报, 2019,42(1):61-67.
[6] ( Lü Guangrui, Cai Guoyong, Lin Yuming. Image Sentiment Analysis with Multimodal Discriminative Embedding Space[J]. Journal of Beijing University of Posts and Telecommunications, 2019,42(1):61-77.)
[7] Katsurai M, Satoh S. Image Sentiment Analysis Using Latent Correlations Among Visual, Textual, and Sentiment Views[C]//Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China. 2016: 2837-2841.
[8] Neumann M, Vu T. Cross-lingual and Multilingual Speech Emotion Recognition on English and French[C]//Proceedings of the 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, Canada. 2018: 5769-5773.
[9] Siegert I, Requardt A, Egorow O, et al. Improving Speech-Based Emotion Recognition by Using Psychoacoustic Modeling and Analysis-by-Synthesis[C]//Proceedings of the 2017 International Conference on Speech and Computer (SPECOM), Hatfield, UK. 2017: 445-455.
[10] Rathi M, Malik A, Varshney D, et al. Sentiment Analysis of Tweets Using Machine Learning Approach[C]//Proceedings of the 11th International Conference on Contemporary Computing, Nodia, India. 2018: 1-3.
[11] Huang Z, Zhao Z, Liu Q, et al. An Unsupervised Method for Short-Text Sentiment Analysis Based on Analysis of Massive Data[C]//Proceedings of the 2015 International Conference of Young Computer Scientists, Engineers and Educators (ICYCSEE), Harbin, China. 2015: 169-176.
[12] Park S, Kim Y. Building Thesaurus Lexicon Using Dictionary-Based Approach for Sentiment Classification[C]//Proceedings of the 2016 IEEE 14th International Conference on Software Engineering Research, Towson, MD, USA. 2016: 39-44.
[13] Rao Y, Lei J, Liu W Y, et al. Building Emotional Dictionary for Sentiment Analysis of Online News[J]. World Wide Web, 2014,17(4):723-742.
doi: 10.1007/s11280-013-0221-9
[14] Li W, Guo K, Shi Y, et al. Improved New Word Detection Method Used in Tourism Field[J]. Procedia Computer Science, 2017,108:1251-1260.
doi: 10.1016/j.procs.2017.05.022
[15] Yan L W, Bai B, Chen W, et al. New Word Extraction from Chinese Financial Documents[J]. IEEE Signal Processing Letters, 2017,24(6):770-773.
doi: 10.1109/LSP.97
[16] 张仰森, 郑佳, 黄改娟, 等. 基于双重注意力模型的微博情感分析方法[J]. 清华大学学报(自然科学版), 2018,58(2):122-130.
[16] ( Zhang Yangsen, Zheng Jia, Huang Gaijuan, et al. Microblog Sentiment Analysis Method Based on a Double Attention Model[J]. Journal of Tsinghua University (Science and Technology), 2018,58(2):122-130.)
[17] Zhang S X, Wang Y, Zhang S Y, et al. Building Associated Semantic Representation Model for the Ultra-Short Microblog Text Jumping in Big Data[J]. Cluster Computing, 2016,19(3):1399-1410.
doi: 10.1007/s10586-016-0602-9
[18] Santos I, Nedjah N, Mourelle L D M. Sentiment Analysis Using Convolutional Neural Network with FastText Embeddings[C]//Proceedings of the 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI), Arequipa, Peru. 2017: 1-5.
[19] Chen T, Xu R, He Y, et al. Improving Sentiment Analysis via Sentence Type Classification Using BILSTM-CRF and CNN[J]. Expert Systems with Applications, 2017,72:221-230.
doi: 10.1016/j.eswa.2016.10.065
[20] 朱晓霞, 宋嘉欣, 孟建芳. 基于动态主题—情感演化模型的网络舆情信息分析[J]. 情报科学, 2019,37(7):72-78.
[20] ( Zhu Xiaoxia, Song Jiaxin, Meng Jianfang. Analysis of Online Public Opinion Information Based on the Dynamic Theme-Emotion Evolution Model[J]. Information Science, 2019,37(7):72-78.)
[21] Zhang X, Li C. The Research of Sentiment Analysis of Microblog Based on Data Mining: Exampled by Basic Endowment Insurance[C]//Proceedings of the 2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Xiamen, China. 2017: 1-5.
[22] Zucco C, Calabrese B, Cannataro M. Sentiment Analysis and Affective Computing for Depression Monitoring[C]//Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, USA. 2017: 1988-1995.
[23] Tago K, Jin Q. Influence Analysis of Emotional Behaviors and User Relationships Based on Twitter Data[J]. Tsinghua Science and Technology, 2018,23(1):104-113.
doi: 10.26599/TST.2018.9010012
[24] Gao K, Xu H, Wang J. A Rule-Based Approach to Emotion Cause Detection for Chinese Micro-Blogs[J]. Expert Systems with Applications, 2015,42(9):4517-4528.
doi: 10.1016/j.eswa.2015.01.064
[25] Naskar D, Singh S R, Kumar D, et al. Emotion Dynamics of Public Opinions on Twitter[J]. ACM Transactions on Information Systems, 2020, 38(2): Article No.18.
[26] 彭希羡, 朱庆华, 刘璇. 微博客用户特征分析及分类研究——以“新浪微博”为例[J]. 情报科学, 2015,33(1):69-75.
[26] ( Peng Xixian, Zhu Qinghua, Liu Xuan. Research on Behavior Characteristics and Classification of Micro-blog Users——Taking “Sina Micro-blog” as an Example[J]. Information Science, 2015,33(1):69-75.)
[1] 王若琳, 牛振东, 蔺奇卡, 朱一凡, 邱萍, 陆浩, 刘东磊. 基于异质信息嵌入与RNN聚类参数预测的作者姓名消歧方法*[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[2] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[3] 王晰巍,贾若男,韦雅楠,张柳. 多维度社交网络舆情用户群体聚类分析方法研究*[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[4] 卢利农,祝忠明,张旺强,王小春. 基于Lingo3G聚类算法的机构知识库跨库知识整合与知识指纹服务实现[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
[5] 刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[6] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[7] 常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[8] 丁浩, 艾文华, 胡广伟, 李树青, 索炜. 融合用户兴趣波动时序的个性化推荐模型*[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[9] 韩普, 张伟, 张展鹏, 王宇欣, 方浩宇. 基于特征融合和多通道的突发公共卫生事件微博情感分析*[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[10] 郑新曼, 董瑜. 基于科技政策文本的程度词典构建研究*[J]. 数据分析与知识发现, 2021, 5(10): 81-93.
[11] 杨辰, 陈晓虹, 王楚涵, 刘婷婷. 基于用户细粒度属性偏好聚类的推荐策略*[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[12] 吕华揆,刘政昊,钱宇星,洪旭东. 异质性财经新闻与股市关系研究*[J]. 数据分析与知识发现, 2021, 5(1): 99-111.
[13] 于丰畅,程齐凯,陆伟. 基于几何对象聚类的学术文献图表定位研究[J]. 数据分析与知识发现, 2021, 5(1): 140-149.
[14] 温萍梅,叶志炜,丁文健,刘颖,徐健. 命名实体消歧研究进展综述*[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[15] 邬金鸣,侯跃芳,崔雷. 基于医学主题词标引规则的词共现聚类分析结果自动判读和表达的研究[J]. 数据分析与知识发现, 2020, 4(9): 133-144.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn