Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (6): 25-35    DOI: 10.11925/infotech.2096-3467.2020.0077
Current Issue | Archive | Adv Search |
Clustering User Groups of Public Opinion Events from Multi-dimensional Social Network
Wang Xiwei1,2,3,Jia Ruonan1(),Wei Yanan1,Zhang Liu1
1School of Management, Jilin University, Changchun 130022, China
2Research Center for Big Data Management, Jilin University, Changchun 130022, China
3Cyberspace Governance Research Center, Jilin University, Changchun 130022, China
Download: PDF (2244 KB)   HTML ( 35
Export: BibTeX | EndNote (RIS)      

[Objective] User groups are the main units to disseminate public opinion. This study identifies the characteristics of user groups through clustering techniques, which could help social network companies provide better services. [Methods] With the help of Group Theory, we clustered users based on their influence, sentiments, and behaviors. First, we collected user data from the Sina Weibo. Then, we utilized Canopy and K-Means algorithms to cluster users. Finally, we visualized our findings with Neo4j and Weka. [Results] User groups of the same public opinion event were different in emotion, influence, and behaviors, while user groups from different public opinion events shared common characteristics. [Limitations] Both public opinion events in this study happened at Chinese universities, and we only collected data from Sina Weibo. [Conclusions] Based on the clustering results, we could propose effective administration strategies for each user group in the same or different public opinion events.

Key wordsMulti-dimensional      Social Network      Public Opinion      User Group      User Clustering     
Received: 03 February 2020      Published: 06 July 2021
ZTFLH:  TP393  
Fund:Special Research Project of National Development and Security (Biosafety) of Jilin University(2020JDGFAZ003);Jilin University Postgraduate Innovation Fund(101832020CX057)
Corresponding Authors: Jia Ruonan     E-mail:

Cite this article:

Wang Xiwei,Jia Ruonan,Wei Yanan,Zhang Liu. Clustering User Groups of Public Opinion Events from Multi-dimensional Social Network. Data Analysis and Knowledge Discovery, 2021, 5(6): 25-35.

URL:     OR

Social Network Public Opinion User Clustering Analysis Method
用户昵称 文本内容 情感分类 置信度
回头万里人常叹 真是讽刺,再次失望。 消极 0.87
diamondli99 ……可悲,中国的博士,绝对弱势群体…… 消极 0.84
小兜里满满的幸福 @中国大学生在线 @中国教育在线考研频道 @央视新闻 中性 0.94
秋桐小宅女 看调查结果。 中性 0.98
真无羽 逝者安息,实验都具有风险性,感谢他们为科学做出的贡献,安息。 积极 0.82
三七二十一个酥 ……珍惜身边的每个人吧!……愿逝者安息,实验室安全警钟长鸣…… 积极 0.88
放风筝的灰原哀 ……学术诚可贵,生命价更高,一定要注意实验安全…… 积极 0.70
叫我杏仁 岂止是难过,含辛茹苦培养出的人才,真的活不了了。 消极 0.70
CMLY丶F 切记水火无情。 中性 0.73
User Sentiment Classification Results and Confidence (Partial)
用户昵称 PageRank值 用户昵称 PageRank值
北京消防 1 736.80 澎湃新闻 897.18
江宁公安在线 52.31 南京大学 719.89
中国消防 2.24 陈迪Winston 343.35
北京交通大学 997.05 头条新闻 317.21
懒懒的周小姐 0.15 小姐姐爱学习 0.15
慎独明智 0.15 一只阿迟儿 0.15
KaiHugo 0.15 北欧DJ 0.15
PageRank Value of Users (Partial)
SSE Trend Chart of Each Event
类簇 聚类结果
“北交大”事件 “南大”事件
0 9 491 (24%) 1 675 (6%)
1 2 158 (5%) 1 485 (5%)
2 14 974 (38%) 2 729 (9%)
3 1 685 (4%) 15 144 (50%)
4 3 250 (8%) 3 360 (11%)
5 5 434 (14%) 1 446 (5%)
6 458 (1%) 2 540 (8%)
7 1 457 (4%) 1 210 (4%)
8 901 (2%) 610 (2%)
Clustering Results of Event User Group
Clustering Result of Each Event
Sentiment Distribution of Each Cluster
PageRank Value Distribution of Each Cluster
User Group Relationship Map
[1] 艾媒咨询. 2020-2021年中国移动社交行业研究报告[EB/OL]. [2021-05-18].
[1] (iiMedia Resaerch. China Mobile Social Industry Research Report 2020-2021[EB/OL]. [2021-05-18].
[2] 新浪微博数据中心. 2018微博用户发展报告[EB/OL]. [2019-11-01].
[2] (Sina Weibo Data Center. 2018 Weibo User Development Report [EB/OL]. [2019-11-01].
[3] Garcia D, Rimé B. Collective Emotions and Social Resilience in the Digital Traces After a Terrorist Attack[J]. Psychological Science, 2019,30(4):617-628.
doi: 10.1177/0956797619831964 pmid: 30865565
[4] Qiu Z C, Shen H. User Clustering in a Dynamic Social Network Topic Model for Short Text Streams[J]. Information Sciences, 2017,414:102-116.
doi: 10.1016/j.ins.2017.05.018
[5] Liu Z Y, Ma Y H. A Divide and Agglomerate Algorithm for Community Detection in Social Networks[J]. Information Sciences, 2019,482:321-333.
doi: 10.1016/j.ins.2019.01.028
[6] You X M, Ma Y H, Liu Z Y. A Three-stage Algorithm on Community Detection in Social Networks[J]. Knowledge-Based Systems, 2020,187:104822.
doi: 10.1016/j.knosys.2019.06.030
[7] 林燕霞, 谢湘生. 基于社会认同理论的微博群体用户画像[J]. 情报理论与实践, 2018,41(3):142-148.
[7] (Lin Yanxia, Xie Xiangsheng. User Portrait of Diversified Groups in Micro-blog Based on Social Identity Theory[J]. Information Studies: Theory & Applicaiton, 2018,41(3):142-148.)
[8] 何高奇, 边晓晖, 孙菲, 等. 基于传染病机制的突发事件下群体情绪感染模型[J]. 华东理工大学学报(自然科学版), 2018,44(6):909-917, 949.
[8] (He Gaoqi, Bian Xiaohui, Sun Fei, et al. Crowd Emotional Contagion Model Based on the Epidemic Mechanism under Emergencies[J]. Journal of East China University of Science and Technology (Natural Science Edition), 2018,44(6):909-917, 949.)
[9] 张海涛, 刘雅姝, 张枭慧, 等. 基于模块度的话题发现及网民情感波动研究——以新浪微博“中美间贸易摩擦”话题为例[J]. 图书情报工作, 2019,63(4):5-14.
[9] (Zhang Haitao, Liu Yashu, Zhang Xiaohui, et al. Research on Topic Discovery Based on Modularity and Sentiment Fluctuation of Internet Users——Taking Sina Weibo’s “China-US Trade Friction” as an Example[J]. Library and Information Service, 2019,63(4):5-14.)
[10] 孙越恒, 刘晓彤, 王文俊. 事件驱动的在线社交群体演化行为预测[J]. 情报杂志, 2019,38(6):110-117.
[10] (Sun Yueheng, Liu Xiaotong, Wang Wenjun. Predicting the Event-driven Evolution Behavior of Online Social Groups[J]. Journal of Intelligence, 2019,38(6):110-117.)
[11] 顾明远. 教育大辞典(增订合编本)[M]. 上海: 上海教育出版社, 1998.
[11] (Gu Mingyuan. The Dictionary of Education (Revised Edition)[M]. Shanghai: Shanghai Education Press, 1998.)
[12] 古斯塔夫·勒庞. 乌合之众:大众心理研究[M]. 冯克利译. 北京: 中央编译出版社, 2005.
[12] (Gustav. Le Pen. The Crowd: A Study of Popular Mind[M]. Translated by Feng Keli. Beijing: Central Compilation and Translation Press, 2005.)
[13] 王枭, 陈云奔. 旁观者对校园欺凌影响及其纠偏策略——基于群体理论的分析[J]. 学习与探索, 2019(3):44-48.
[13] (Wang Xiao, Chen Yunben. The Effect of Bystanders on Campus Bullying and Its Corrective Strategies: Analysis Based on Group Theory[J]. Study & Exploration, 2019(3):44-48.)
[14] 张海涛, 唐诗曼, 魏明珠, 等. 多维度属性加权分析的微博用户聚类研究[J]. 图书情报工作, 2018,62(24):124-133.
[14] (Zhang Haitao, Tang Shiman, Wei Mingzhu, et al. Research on the Clustering of Microblog Users Based on Multi-dimensional Attribute Weighting Analysis[J]. Library and Information Service, 2018,62(24):124-133.)
[15] Liang S S, Ren Z C, Zhao Y K, et al. Inferring Dynamic User Interests in Streams of Short Texts for User Clustering[J]. ACM Transactions on Information Systems, 2017,36(1):10.
[16] Hu L, Xing Y H, Gong Y L, et al. Nonnegative Matrix Tri-factorization with User Similarity for Clustering in Point-of-Interest[J]. Neurocomputing, 2019,363:58-65.
doi: 10.1016/j.neucom.2019.07.040
[17] Koc S S, Ozer M, Toroslu I H, et al. Triadic Co-clustering of Users, Issues and Sentiments in Political Tweets[J]. Expert Systems with Applications, 2018,100:79-94.
doi: 10.1016/j.eswa.2018.01.043
[18] 王晰巍, 张柳, 文晴, 等. 基于贝叶斯模型的移动环境下网络舆情用户情感演化研究——以新浪微博“里约奥运会中国女排夺冠”话题为例[J]. 情报学报, 2018,37(12):1241-1248.
[18] (Wang Xiwei, Zhang Liu, Wen Qing, et al. Research on Sentiment Evaluation of Online Public Opinion Based on the Bayesian Model in a Mobile Environment: The Case of “China Women’s Volleyball Won the Championship in the Rio Olympics” in Sina Weibo[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(12):1241-1248.)
[19] 毕殿杰, 魏苏林, 赵涛, 等. 基于卷积神经网络的在线评论情感分析模型[J]. 河北科技师范学院学报, 2019,33(2):41-47.
[19] (Bi Dianjie, Wei Sulin, Zhao Tao, et al. Sentiment Analysis of Online Comments Based on Convolutional Neural Network[J]. Journal of Hebei Normal University of Science & Technology, 2019,33(2):41-47.)
[20] 张柳, 王晰巍, 黄博, 等. 基于字词向量的多尺度卷积神经网络微博评论的情感分类模型及实验研究[J]. 图书情报工作, 2019,63(18):99-108.
[20] (Zhang Liu, Wang Xiwei, Huang Bo, et al. A Sentiment Classification Model and Experimental Study of Microblog Commentary Based on Multivariate Convolutional Neural Networks Based on Word Vector[J]. Library and Information Service, 2019,63(18):99-108.)
[21] 李慧, 柴亚青. 基于卷积神经网络的细粒度情感分析方法[J]. 数据分析与知识发现, 2019,3(1):95-103.
[21] (Li Hui, Chai Yaqing. Fine-grained Sentiment Analysis Based on Convolutional Neural Network[J]. Data Analysis and Knowledge Discovery, 2019,3(1):95-103.)
[22] 王晰巍, 邢云菲, 韦雅楠, 等. 大数据驱动的社交网络舆情用户情感主题分类模型构建研究——以“移民”主题为例[J]. 信息资源管理学报, 2020,10(1):29-38, 48.
[22] (Wang Xiwei, Xing Yunfei, Wei Yanan, et al. Research on the Topic Model Construction of Sentiment Classification of Public Opinion Users in Social Networks Driven by Big Data——Taking “Immigration” as the Topic[J]. Journal of Information Resources Management, 2020,10(1):29-38, 48.)
[23] Zhang S X, Wei Z L, Wang Y, et al. Sentiment Analysis of Chinese Micro-blog Text Based on Extended Sentiment Dictionary[J]. Future Generation Computer Systems, 2018,81:395-403.
doi: 10.1016/j.future.2017.09.048
[24] 林青, 李立煊, 杨腾飞. 社交网络用户影响力量化模型研究——以新浪微博为例[J]. 情报杂志, 2018,37(8):203-207.
[24] (Lin Qing, Li Lixuan, Yang Tengfei. Study on User Influence Quantitative Model of Social Network——Taking Sina Microblog for Example[J]. Journal of Intelligence, 2018,37(8):203-207.)
[25] 陈思菁, 李纲, 毛进, 等. 突发事件信息传播网络中的关键节点动态识别研究[J]. 情报学报, 2019,38(2):178-190.
[25] (Chen Sijing, Li Gang, Mao Jin, et al. Dynamic Identification of Key Nodes in Information Propagation Networks During Emergencies[J]. Journal of the China Society for Scientific and Technical Information, 2019,38(2):178-190.)
[26] 王玙, 刘东苏. 基于PageRank的动态网络核心节点检测及演化分析[J]. 情报学报, 2018,37(7):703-711.
[26] (Wang Yu, Liu Dongsu. Vital Node Detection and Evolution Analysis in Dynamic Networks Based on PageRank[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(7):703-711.)
[27] 陈晓威, 史昱天. 社会网络中关键节点的识别——基于符号网络的PageRank算法改进[J]. 数据分析与知识发现, 2017,1(8):68-75.
[27] (Chen Xiaowei, Shi Yutian. Identifying Key Nodes in Social Network with Improved PageRank Algorithm[J]. Data Analysis and Knowledge Discovery, 2017,1(8):68-75.)
[28] 张凤军. 基于 Neo4j 图数据库的社交网络数据的研究与应用[D]. 长沙:湖南大学, 2016.
[28] (Zhang Fengjun. Research and Application of Social Network Data Based on Neo4j Graph Database[D]. Changsha: Hunan University, 2016.)
[29] Holzschuher F, Peinl R. Performance of Graph Query Languages: Comparison of Cypher, Gremlin and Native Access in Neo4j[C]// Proceedings of the Joint EDBT/ICDT 2013 Workshops. ACM, 2013: 195-204.
[30] Owen S, Anil R, Dunning T, 等. Mahout实战[M]. 王斌, 韩冀中, 万吉, 译. 北京: 人民邮电出版社, 2014: 134-138.
[30] (Owen S, Anil R, Dunning T, et al. Mahout in Action[M]. Translated by Wang Bin, Han Jizhong, Wan Ji. Beijing: Posts & Telecom Press, 2014: 134-138.)
[31] 张琳, 牟向伟. 基于Canopy + K-means的中文文本聚类算法[J]. 图书馆论坛, 2018,38(6):113-119.
[31] (Zhang Lin, Mou Xiangwei. Chinese Text Clustering Algorithm Based on Canopy + K-means[J]. Library Tribune, 2018,38(6):113-119.)
[32] Alexa. Traffic Detail (sina. com. cn) [EB/OL]. [2019-11-08].
[33] 新媒田宇. 校园舆情·2018年高校十大热点事件[EB/OL]. [2019-11-08].
[33] (New Media Tian Yu. Campus Public Opinion · Top Ten Hot Events of Colleges and Universities in 2018[EB/OL]. [2019-11-08].
[34] Cho S W, Cha M S, Sohn K A. Topic Category Analysis on Twitter via Cross-media Strategy[J]. Multimedia Tools & Applications, 2016,75(20):12879-12899.
[35] Nainggolan R, Perangin-Angin R, Simarmata E, et al. Improved the Performance of the K-Means Cluster Using the Sum of Squared Error (SSE) Optimized by Using the Elbow Method[J]. Journal of Physics Conference Series, 2019,1361:012015.
doi: 10.1088/1742-6596/1361/1/012015
[36] Fernandez-Gavilanes M, Juncal-Martinez J, García-Méndez S, et al. Differentiating Users by Language and Location Estimation in Sentiment Analysis of Informal Text During Major Public Events[J]. Expert Systems with Applications, 2019,117:15-28.
doi: 10.1016/j.eswa.2018.09.007
[37] Zhang W, Wang M, Zhu Y C. Does Government Information Release Really Matter in Regulating Contagion-Evolution of Negative Emotion During Public Emergencies? From the Perspective of Cognitive Big Data Analytics[J]. International Journal of Information Management, 2020,50:498-514.
doi: 10.1016/j.ijinfomgt.2019.04.001
[38] Chen S J, Mao J, Li G, et al. Uncovering Sentiment and Retweet Patterns of Disaster-related Tweets from a Spatiotemporal Perspective - A Case Study of Hurricane Harvey[J]. Telematics and Informatics, 2020,47:101326.
doi: 10.1016/j.tele.2019.101326
[39] Lee J Y H, Yang C S, Hsu C, et al. A Longitudinal Study of Leader Influence in Sustaining an Online Community[J]. Information & Management, 2019,56(2):306-316.
doi: 10.1016/
[40] Ahajjam S, Haddad M E, Badir H. A New Scalable Leader-community Detection Approach for Community Detection in Social Networks[J]. Social Networks, 2018,54:41-49.
doi: 10.1016/j.socnet.2017.11.004
[41] Zhang L F, Su C, Jin Y F, et al. Cross-network Dissemination Model of Public Opinion in Coupled Networks[J]. Information Sciences, 2018,451:240-252.
[42] Li C L, Bai J P, Zhang L, et al. Opinion Community Detection and Opinion Leader Detection Based on Text Information and Network Topology in Cloud Environment[J]. Information Sciences, 2019,504:61-83.
doi: 10.1016/j.ins.2019.06.060
[1] Fan Tao,Wang Hao,Wu Peng. Sentiment Analysis of Online Users' Negative Emotions Based on Graph Convolutional Network and Dependency Parsing[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[2] Ma Yingxue,Zhao Jichang. Patterns and Evolution of Public Opinion on Weibo During Natural Disasters: Case Study of Typhoons and Rainstorms[J]. 数据分析与知识发现, 2021, 5(6): 66-79.
[3] Gao Yilin,Min Chao. Comparing Technology Diffusion Structure of China and the U.S. to Countries Along the Belt and Road[J]. 数据分析与知识发现, 2021, 5(6): 80-92.
[4] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[5] Wang Nan,Li Hairong,Tan Shuru. Predicting of Public Opinion Reversal with Improved SMOTE Algorithm and Ensemble Learning[J]. 数据分析与知识发现, 2021, 5(4): 37-48.
[6] Xu Yabin, Sun Qiutian. Identifying Leaders and Dissemination Paths of Public Opinion[J]. 数据分析与知识发现, 2021, 5(2): 32-42.
[7] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[8] Cheng Tiejun, Wang Man, Huang Baofeng, Feng Lanping. Predicting Online Public Opinion in Emergencies Based on CEEMDAN-BP[J]. 数据分析与知识发现, 2021, 5(11): 59-67.
[9] Li Guangjian,Wang Kai,Zhang Qingzhi. Analysis Framework Based on Multi-Source Data for US Export Control: An Empirical Study[J]. 数据分析与知识发现, 2020, 4(9): 26-40.
[10] Shao Qi,Mu Dongmei,Wang Ping,Jin Chunyan. Identifying Subjects of Online Opinion from Public Health Emergencies[J]. 数据分析与知识发现, 2020, 4(9): 68-80.
[11] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[12] Zheng Songyin,Tan Guoxin,Shi Zhongchao. Recommending Tourism Attractions Based on Segmented User Groups and Time Contexts[J]. 数据分析与知识发现, 2020, 4(5): 92-104.
[13] Deng Jiangao,Zhang Xuan,Fu Zhu,Wei Qingming. Tracking Online Public Opinion Based on System Dynamics: Case Study of “Xiangshui Explosion Accident”[J]. 数据分析与知识发现, 2020, 4(2/3): 110-121.
[14] Liang Yanping,An Lu,Liu Jing. Topic Resonance of Micro-blogs on Similar Public Health Emergencies[J]. 数据分析与知识发现, 2020, 4(2/3): 122-133.
[15] Ding Shengchun,Yu Fengyang,Li Zhen. Identifying Potential Trending Topics of Online Public Opinion[J]. 数据分析与知识发现, 2020, 4(2/3): 29-38.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938