Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (6): 25-35     https://doi.org/10.11925/infotech.2096-3467.2020.0077
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
多维度社交网络舆情用户群体聚类分析方法研究*
王晰巍1,2,3,贾若男1(),韦雅楠1,张柳1
1吉林大学管理学院 长春 130022
2吉林大学大数据管理研究中心 长春 130022
3吉林大学网络空间治理研究中心 长春 130022
Clustering User Groups of Public Opinion Events from Multi-dimensional Social Network
Wang Xiwei1,2,3,Jia Ruonan1(),Wei Yanan1,Zhang Liu1
1School of Management, Jilin University, Changchun 130022, China
2Research Center for Big Data Management, Jilin University, Changchun 130022, China
3Cyberspace Governance Research Center, Jilin University, Changchun 130022, China
全文: PDF (2244 KB)   HTML ( 31
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 通过舆情用户群体聚类为舆情监管部门和社交网络服务提供商定位用户群体特征、实施有针对性的管控措施提供新的视角和思路。【方法】 以群体理论为基础,从用户的影响力特征、情感特征和行为特征出发进行聚类,通过采集新浪微博平台用户数据,利用Canopy、K-Means算法进行聚类,最终通过Neo4j和Weka进行可视化呈现。【结果】 聚类结果表明,同一舆情事件的用户群体在情感、影响力和行为等方面存在差异,不同舆情事件的用户群体在上述方面也会存在相同点。【局限】 两事件均为高校舆情事件,并且仅以新浪微博平台作为数据来源。【结论】 根据聚类结果可针对相同舆情事件和不同舆情事件中的各个用户群体提出对应的管控策略。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王晰巍
贾若男
韦雅楠
张柳
关键词 多维度社交网络舆情用户群体用户聚类    
Abstract

[Objective] User groups are the main units to disseminate public opinion. This study identifies the characteristics of user groups through clustering techniques, which could help social network companies provide better services. [Methods] With the help of Group Theory, we clustered users based on their influence, sentiments, and behaviors. First, we collected user data from the Sina Weibo. Then, we utilized Canopy and K-Means algorithms to cluster users. Finally, we visualized our findings with Neo4j and Weka. [Results] User groups of the same public opinion event were different in emotion, influence, and behaviors, while user groups from different public opinion events shared common characteristics. [Limitations] Both public opinion events in this study happened at Chinese universities, and we only collected data from Sina Weibo. [Conclusions] Based on the clustering results, we could propose effective administration strategies for each user group in the same or different public opinion events.

Key wordsMulti-dimensional    Social Network    Public Opinion    User Group    User Clustering
收稿日期: 2020-02-03      出版日期: 2021-07-06
ZTFLH:  TP393  
基金资助:*吉林大学国家发展与安全(生物安全)专项研究课题(2020JDGFAZ003);吉林大学研究生创新基金资助项目(101832020CX057)
通讯作者: 贾若男     E-mail: 2943442131@qq.com
引用本文:   
王晰巍,贾若男,韦雅楠,张柳. 多维度社交网络舆情用户群体聚类分析方法研究*[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
Wang Xiwei,Jia Ruonan,Wei Yanan,Zhang Liu. Clustering User Groups of Public Opinion Events from Multi-dimensional Social Network. Data Analysis and Knowledge Discovery, 2021, 5(6): 25-35.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0077      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I6/25
Fig.1  社交网络舆情用户群体聚类分析方法
用户昵称 文本内容 情感分类 置信度
回头万里人常叹 真是讽刺,再次失望。 消极 0.87
diamondli99 ……可悲,中国的博士,绝对弱势群体…… 消极 0.84
小兜里满满的幸福 @中国大学生在线 @中国教育在线考研频道 @央视新闻 中性 0.94
秋桐小宅女 看调查结果。 中性 0.98
真无羽 逝者安息,实验都具有风险性,感谢他们为科学做出的贡献,安息。 积极 0.82
三七二十一个酥 ……珍惜身边的每个人吧!……愿逝者安息,实验室安全警钟长鸣…… 积极 0.88
放风筝的灰原哀 ……学术诚可贵,生命价更高,一定要注意实验安全…… 积极 0.70
叫我杏仁 岂止是难过,含辛茹苦培养出的人才,真的活不了了。 消极 0.70
CMLY丶F 切记水火无情。 中性 0.73
Table 1  用户情感分类结果及置信度(部分)
用户昵称 PageRank值 用户昵称 PageRank值
北京消防 1 736.80 澎湃新闻 897.18
江宁公安在线 52.31 南京大学 719.89
中国消防 2.24 陈迪Winston 343.35
北京交通大学 997.05 头条新闻 317.21
懒懒的周小姐 0.15 小姐姐爱学习 0.15
慎独明智 0.15 一只阿迟儿 0.15
KaiHugo 0.15 北欧DJ 0.15
Table 2  用户PageRank值(部分)
Fig.2  舆情事件SSE变化趋势图
类簇 聚类结果
“北交大”事件 “南大”事件
0 9 491 (24%) 1 675 (6%)
1 2 158 (5%) 1 485 (5%)
2 14 974 (38%) 2 729 (9%)
3 1 685 (4%) 15 144 (50%)
4 3 250 (8%) 3 360 (11%)
5 5 434 (14%) 1 446 (5%)
6 458 (1%) 2 540 (8%)
7 1 457 (4%) 1 210 (4%)
8 901 (2%) 610 (2%)
Table 3  事件用户群体聚类结果
Fig.3  舆情事件聚类结果
Fig.4  各类簇情感分布
Fig.5  各类簇PageRank值分布
Fig.6  用户群体聚类关系
[1] 艾媒咨询. 2020-2021年中国移动社交行业研究报告[EB/OL]. [2021-05-18]. https://www.sohu.com/a/442605711_533924.
[1] (iiMedia Resaerch. China Mobile Social Industry Research Report 2020-2021[EB/OL]. [2021-05-18]. https://www.sohu.com/a/442605711_533924.)
[2] 新浪微博数据中心. 2018微博用户发展报告[EB/OL]. [2019-11-01]. http://www.199it.com/archives/847890.html.
[2] (Sina Weibo Data Center. 2018 Weibo User Development Report [EB/OL]. [2019-11-01]. http://www.199it.com/archives/847890.html.)
[3] Garcia D, Rimé B. Collective Emotions and Social Resilience in the Digital Traces After a Terrorist Attack[J]. Psychological Science, 2019,30(4):617-628.
doi: 10.1177/0956797619831964 pmid: 30865565
[4] Qiu Z C, Shen H. User Clustering in a Dynamic Social Network Topic Model for Short Text Streams[J]. Information Sciences, 2017,414:102-116.
doi: 10.1016/j.ins.2017.05.018
[5] Liu Z Y, Ma Y H. A Divide and Agglomerate Algorithm for Community Detection in Social Networks[J]. Information Sciences, 2019,482:321-333.
doi: 10.1016/j.ins.2019.01.028
[6] You X M, Ma Y H, Liu Z Y. A Three-stage Algorithm on Community Detection in Social Networks[J]. Knowledge-Based Systems, 2020,187:104822.
doi: 10.1016/j.knosys.2019.06.030
[7] 林燕霞, 谢湘生. 基于社会认同理论的微博群体用户画像[J]. 情报理论与实践, 2018,41(3):142-148.
[7] (Lin Yanxia, Xie Xiangsheng. User Portrait of Diversified Groups in Micro-blog Based on Social Identity Theory[J]. Information Studies: Theory & Applicaiton, 2018,41(3):142-148.)
[8] 何高奇, 边晓晖, 孙菲, 等. 基于传染病机制的突发事件下群体情绪感染模型[J]. 华东理工大学学报(自然科学版), 2018,44(6):909-917, 949.
[8] (He Gaoqi, Bian Xiaohui, Sun Fei, et al. Crowd Emotional Contagion Model Based on the Epidemic Mechanism under Emergencies[J]. Journal of East China University of Science and Technology (Natural Science Edition), 2018,44(6):909-917, 949.)
[9] 张海涛, 刘雅姝, 张枭慧, 等. 基于模块度的话题发现及网民情感波动研究——以新浪微博“中美间贸易摩擦”话题为例[J]. 图书情报工作, 2019,63(4):5-14.
[9] (Zhang Haitao, Liu Yashu, Zhang Xiaohui, et al. Research on Topic Discovery Based on Modularity and Sentiment Fluctuation of Internet Users——Taking Sina Weibo’s “China-US Trade Friction” as an Example[J]. Library and Information Service, 2019,63(4):5-14.)
[10] 孙越恒, 刘晓彤, 王文俊. 事件驱动的在线社交群体演化行为预测[J]. 情报杂志, 2019,38(6):110-117.
[10] (Sun Yueheng, Liu Xiaotong, Wang Wenjun. Predicting the Event-driven Evolution Behavior of Online Social Groups[J]. Journal of Intelligence, 2019,38(6):110-117.)
[11] 顾明远. 教育大辞典(增订合编本)[M]. 上海: 上海教育出版社, 1998.
[11] (Gu Mingyuan. The Dictionary of Education (Revised Edition)[M]. Shanghai: Shanghai Education Press, 1998.)
[12] 古斯塔夫·勒庞. 乌合之众:大众心理研究[M]. 冯克利译. 北京: 中央编译出版社, 2005.
[12] (Gustav. Le Pen. The Crowd: A Study of Popular Mind[M]. Translated by Feng Keli. Beijing: Central Compilation and Translation Press, 2005.)
[13] 王枭, 陈云奔. 旁观者对校园欺凌影响及其纠偏策略——基于群体理论的分析[J]. 学习与探索, 2019(3):44-48.
[13] (Wang Xiao, Chen Yunben. The Effect of Bystanders on Campus Bullying and Its Corrective Strategies: Analysis Based on Group Theory[J]. Study & Exploration, 2019(3):44-48.)
[14] 张海涛, 唐诗曼, 魏明珠, 等. 多维度属性加权分析的微博用户聚类研究[J]. 图书情报工作, 2018,62(24):124-133.
[14] (Zhang Haitao, Tang Shiman, Wei Mingzhu, et al. Research on the Clustering of Microblog Users Based on Multi-dimensional Attribute Weighting Analysis[J]. Library and Information Service, 2018,62(24):124-133.)
[15] Liang S S, Ren Z C, Zhao Y K, et al. Inferring Dynamic User Interests in Streams of Short Texts for User Clustering[J]. ACM Transactions on Information Systems, 2017,36(1):10.
[16] Hu L, Xing Y H, Gong Y L, et al. Nonnegative Matrix Tri-factorization with User Similarity for Clustering in Point-of-Interest[J]. Neurocomputing, 2019,363:58-65.
doi: 10.1016/j.neucom.2019.07.040
[17] Koc S S, Ozer M, Toroslu I H, et al. Triadic Co-clustering of Users, Issues and Sentiments in Political Tweets[J]. Expert Systems with Applications, 2018,100:79-94.
doi: 10.1016/j.eswa.2018.01.043
[18] 王晰巍, 张柳, 文晴, 等. 基于贝叶斯模型的移动环境下网络舆情用户情感演化研究——以新浪微博“里约奥运会中国女排夺冠”话题为例[J]. 情报学报, 2018,37(12):1241-1248.
[18] (Wang Xiwei, Zhang Liu, Wen Qing, et al. Research on Sentiment Evaluation of Online Public Opinion Based on the Bayesian Model in a Mobile Environment: The Case of “China Women’s Volleyball Won the Championship in the Rio Olympics” in Sina Weibo[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(12):1241-1248.)
[19] 毕殿杰, 魏苏林, 赵涛, 等. 基于卷积神经网络的在线评论情感分析模型[J]. 河北科技师范学院学报, 2019,33(2):41-47.
[19] (Bi Dianjie, Wei Sulin, Zhao Tao, et al. Sentiment Analysis of Online Comments Based on Convolutional Neural Network[J]. Journal of Hebei Normal University of Science & Technology, 2019,33(2):41-47.)
[20] 张柳, 王晰巍, 黄博, 等. 基于字词向量的多尺度卷积神经网络微博评论的情感分类模型及实验研究[J]. 图书情报工作, 2019,63(18):99-108.
[20] (Zhang Liu, Wang Xiwei, Huang Bo, et al. A Sentiment Classification Model and Experimental Study of Microblog Commentary Based on Multivariate Convolutional Neural Networks Based on Word Vector[J]. Library and Information Service, 2019,63(18):99-108.)
[21] 李慧, 柴亚青. 基于卷积神经网络的细粒度情感分析方法[J]. 数据分析与知识发现, 2019,3(1):95-103.
[21] (Li Hui, Chai Yaqing. Fine-grained Sentiment Analysis Based on Convolutional Neural Network[J]. Data Analysis and Knowledge Discovery, 2019,3(1):95-103.)
[22] 王晰巍, 邢云菲, 韦雅楠, 等. 大数据驱动的社交网络舆情用户情感主题分类模型构建研究——以“移民”主题为例[J]. 信息资源管理学报, 2020,10(1):29-38, 48.
[22] (Wang Xiwei, Xing Yunfei, Wei Yanan, et al. Research on the Topic Model Construction of Sentiment Classification of Public Opinion Users in Social Networks Driven by Big Data——Taking “Immigration” as the Topic[J]. Journal of Information Resources Management, 2020,10(1):29-38, 48.)
[23] Zhang S X, Wei Z L, Wang Y, et al. Sentiment Analysis of Chinese Micro-blog Text Based on Extended Sentiment Dictionary[J]. Future Generation Computer Systems, 2018,81:395-403.
doi: 10.1016/j.future.2017.09.048
[24] 林青, 李立煊, 杨腾飞. 社交网络用户影响力量化模型研究——以新浪微博为例[J]. 情报杂志, 2018,37(8):203-207.
[24] (Lin Qing, Li Lixuan, Yang Tengfei. Study on User Influence Quantitative Model of Social Network——Taking Sina Microblog for Example[J]. Journal of Intelligence, 2018,37(8):203-207.)
[25] 陈思菁, 李纲, 毛进, 等. 突发事件信息传播网络中的关键节点动态识别研究[J]. 情报学报, 2019,38(2):178-190.
[25] (Chen Sijing, Li Gang, Mao Jin, et al. Dynamic Identification of Key Nodes in Information Propagation Networks During Emergencies[J]. Journal of the China Society for Scientific and Technical Information, 2019,38(2):178-190.)
[26] 王玙, 刘东苏. 基于PageRank的动态网络核心节点检测及演化分析[J]. 情报学报, 2018,37(7):703-711.
[26] (Wang Yu, Liu Dongsu. Vital Node Detection and Evolution Analysis in Dynamic Networks Based on PageRank[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(7):703-711.)
[27] 陈晓威, 史昱天. 社会网络中关键节点的识别——基于符号网络的PageRank算法改进[J]. 数据分析与知识发现, 2017,1(8):68-75.
[27] (Chen Xiaowei, Shi Yutian. Identifying Key Nodes in Social Network with Improved PageRank Algorithm[J]. Data Analysis and Knowledge Discovery, 2017,1(8):68-75.)
[28] 张凤军. 基于 Neo4j 图数据库的社交网络数据的研究与应用[D]. 长沙:湖南大学, 2016.
[28] (Zhang Fengjun. Research and Application of Social Network Data Based on Neo4j Graph Database[D]. Changsha: Hunan University, 2016.)
[29] Holzschuher F, Peinl R. Performance of Graph Query Languages: Comparison of Cypher, Gremlin and Native Access in Neo4j[C]// Proceedings of the Joint EDBT/ICDT 2013 Workshops. ACM, 2013: 195-204.
[30] Owen S, Anil R, Dunning T, 等. Mahout实战[M]. 王斌, 韩冀中, 万吉, 译. 北京: 人民邮电出版社, 2014: 134-138.
[30] (Owen S, Anil R, Dunning T, et al. Mahout in Action[M]. Translated by Wang Bin, Han Jizhong, Wan Ji. Beijing: Posts & Telecom Press, 2014: 134-138.)
[31] 张琳, 牟向伟. 基于Canopy + K-means的中文文本聚类算法[J]. 图书馆论坛, 2018,38(6):113-119.
[31] (Zhang Lin, Mou Xiangwei. Chinese Text Clustering Algorithm Based on Canopy + K-means[J]. Library Tribune, 2018,38(6):113-119.)
[32] Alexa. Traffic Detail (sina. com. cn) [EB/OL]. [2019-11-08]. http://www.alexa.com/siteinfo/www.sina.com.cn.
[33] 新媒田宇. 校园舆情·2018年高校十大热点事件[EB/OL]. [2019-11-08]. https://baijiahao.baidu.com/s?id=1623436522696496935&wfr=spider&for=pc.
[33] (New Media Tian Yu. Campus Public Opinion · Top Ten Hot Events of Colleges and Universities in 2018[EB/OL]. [2019-11-08]. https://baijiahao.baidu.com/s?id=1623436522696496935&wfr=spider&for=pc.)
[34] Cho S W, Cha M S, Sohn K A. Topic Category Analysis on Twitter via Cross-media Strategy[J]. Multimedia Tools & Applications, 2016,75(20):12879-12899.
[35] Nainggolan R, Perangin-Angin R, Simarmata E, et al. Improved the Performance of the K-Means Cluster Using the Sum of Squared Error (SSE) Optimized by Using the Elbow Method[J]. Journal of Physics Conference Series, 2019,1361:012015.
doi: 10.1088/1742-6596/1361/1/012015
[36] Fernandez-Gavilanes M, Juncal-Martinez J, García-Méndez S, et al. Differentiating Users by Language and Location Estimation in Sentiment Analysis of Informal Text During Major Public Events[J]. Expert Systems with Applications, 2019,117:15-28.
doi: 10.1016/j.eswa.2018.09.007
[37] Zhang W, Wang M, Zhu Y C. Does Government Information Release Really Matter in Regulating Contagion-Evolution of Negative Emotion During Public Emergencies? From the Perspective of Cognitive Big Data Analytics[J]. International Journal of Information Management, 2020,50:498-514.
doi: 10.1016/j.ijinfomgt.2019.04.001
[38] Chen S J, Mao J, Li G, et al. Uncovering Sentiment and Retweet Patterns of Disaster-related Tweets from a Spatiotemporal Perspective - A Case Study of Hurricane Harvey[J]. Telematics and Informatics, 2020,47:101326.
doi: 10.1016/j.tele.2019.101326
[39] Lee J Y H, Yang C S, Hsu C, et al. A Longitudinal Study of Leader Influence in Sustaining an Online Community[J]. Information & Management, 2019,56(2):306-316.
doi: 10.1016/j.im.2018.10.008
[40] Ahajjam S, Haddad M E, Badir H. A New Scalable Leader-community Detection Approach for Community Detection in Social Networks[J]. Social Networks, 2018,54:41-49.
doi: 10.1016/j.socnet.2017.11.004
[41] Zhang L F, Su C, Jin Y F, et al. Cross-network Dissemination Model of Public Opinion in Coupled Networks[J]. Information Sciences, 2018,451:240-252.
[42] Li C L, Bai J P, Zhang L, et al. Opinion Community Detection and Opinion Leader Detection Based on Text Information and Network Topology in Cloud Environment[J]. Information Sciences, 2019,504:61-83.
doi: 10.1016/j.ins.2019.06.060
[1] 马莹雪,赵吉昌. 自然灾害期间微博平台的舆情特征及演变*——以台风和暴雨数据为例[J]. 数据分析与知识发现, 2021, 5(6): 66-79.
[2] 王楠,李海荣,谭舒孺. 基于改进SMOTE算法与集成学习的舆情反转预测研究*[J]. 数据分析与知识发现, 2021, 5(4): 37-48.
[3] 徐雅斌, 孙秋天. 特定舆情的意见领袖挖掘和关键传播路径预测[J]. 数据分析与知识发现, 2021, 5(2): 32-42.
[4] 张梦瑶, 朱广丽, 张顺香, 张标. 基于情感分析的微博热点话题用户群体划分模型 *[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[5] 邵琦,牟冬梅,王萍,靳春妍. 基于语义的突发公共卫生事件网络舆情主题发现研究*[J]. 数据分析与知识发现, 2020, 4(9): 68-80.
[6] 李广建,王锴,张庆芝. 基于多源数据的美国出口管制分析框架及其实证研究*[J]. 数据分析与知识发现, 2020, 4(9): 26-40.
[7] 梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[8] 叶光辉,曾杰妍,胡婧岚,毕崇武. 城市画像视角下的社会公众情感演化研究*[J]. 数据分析与知识发现, 2020, 4(4): 15-26.
[9] 邓建高,张璇,傅柱,韦庆明. 基于系统动力学的突发事件网络舆情传播研究:以“江苏响水爆炸事故”为例*[J]. 数据分析与知识发现, 2020, 4(2/3): 110-121.
[10] 梁艳平,安璐,刘静. 同类突发公共卫生事件微博话题共振研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 122-133.
[11] 丁晟春,俞沣洋,李真. 网络舆情潜在热点主题识别研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 29-38.
[12] 黄微,赵江元,闫璐. 网络热点事件话题漂移指数构建与实证研究*[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[13] 温彦,马立健,曾庆田,郭文艳. 基于地理信息偏好修正和社交关系偏好隐式分析的POI推荐 *[J]. 数据分析与知识发现, 2019, 3(8): 30-39.
[14] 仇丽青,贾玮,范鑫. 基于重叠社区的影响力最大化算法 *[J]. 数据分析与知识发现, 2019, 3(7): 94-102.
[15] 安璐,梁艳平. 突发公共卫生事件微博话题与用户行为选择研究*[J]. 数据分析与知识发现, 2019, 3(4): 33-41.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn