Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (8): 107-118    DOI: 10.11925/infotech.2096-3467.2020.0091
Current Issue | Archive | Adv Search |
Analyzing & Clustering Enterprise Microblog Users with Supernetwork
Xi Yunjiang1,Du Diedie1,Liao Xiao2(),Zhang Xuehong1
1School of Business Administration, South China University of Technology, Guangzhou 510641, China
2School of Internet Finance and Information Engineering, Guangdong University of Finance,Guangzhou 510521, China
Download: PDF (1802 KB)   HTML ( 12
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes an integrated modeling method to process multi-dimensional user interest data, aiming to examine the spectral clustering method for analyzing user interests. [Methods] First, we retrieved Weibo (Microblog) data of "Three Squirrels" and used supernetwork model to integrate the modeling of contents and user interaction data. Then, we constructed an interactive interest index and grouped the users with spectral clustering algorithm. Finally, we evaluated the clustering results with the Silhouette Coefficient and Davies-Bouldin methods. [Results] We found that the clustering DB value reached 0.57 (k was set at 15), which was evenly distributed. [Limitations] More research is needed to further explore user characteristic data and the impacts of different data dimensions on user interests. [Conclusions] This study proposes maintenance and marketing suggestions for enterprise Weibo profiles, which will help them identify user interests and improve marketing effectiveness.

Key wordsSupernetwork      Enterprise Microblog      User Interests      Spectral Clustering     
Received: 10 February 2020      Published: 14 September 2020
ZTFLH:  G206  
Corresponding Authors: Liao Xiao     E-mail: 1448362251@qq.com

Cite this article:

Xi Yunjiang, Du Diedie, Liao Xiao, Zhang Xuehong. Analyzing & Clustering Enterprise Microblog Users with Supernetwork. Data Analysis and Knowledge Discovery, 2020, 4(8): 107-118.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0091     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I8/107

Topics-Keywords Network (Partial)
Users-Topics Network (Partial)
EMTIS Supernetwork
Process of Spectral Clustering Algorithm Based on EMTIS
话题1 话题2
关键词 权重 关键词 权重
双12 0.824 467 合照 0.629 198
福利 0.559 995 自拍 0.336 896
吃土 0.412 233 大咖秀 0.314 599
剁手党 0.412 233 剪刀手 0.314 599
吃货 0.412 233 双十一 0.314 599
Feature Word Extraction Example
用户ID 特征词
2642129313 抱枕、旅游、零食、写真、网页链接、抽奖、玩偶、果干
1042447931 福利、周末、转发、礼包、坚果手机、回家、焕新季
5026461834 游戏、新品、试吃、云果园、果干、猜中、萌宠
Users and Corresponding Feature Words
排名 核心词 频次 排名 核心词 频次
1 主人 478 6 零食 95
2 转发 220 7 年货 75
3 吃货 162 8 投票 71
4 网页链接 152 9 回家 71
5 坚果 151 10 福利 66
Core Feature Word Statistics (Top10)
排名 用户ID 参与话题数 排名 用户ID 参与话题数
1 2238363480 362 6 5497626858 154
2 2389941761 264 7 2834492565 150
3 1939554543 257 8 2267365535 149
4 5348522194 185 9 1712477690 141
5 5591452414 185 10 5208353983 122
User Participation Topic Statistics (Top10)
Silhouette Coefficient Cluster Evaluation
Davies-Bouldin Cluster Evaluation
类团 人数 粉群名称 主要关键词
1 336 旅游爱好者 零食包、神器、处女座、美照、旅行
2 391 宅男宅女 福利、松鼠君、主页菌、周末
3 788 单身狗与情侣 七夕、单身、基友、头像、公仔
4 412 抽奖热衷群体1 坚果手机、实力派、开奖、新技能、潮礼
5 215 新品关注者 云果园、新品、果干、链接
6 389 周边爱好者 头像、漫画、涂鸦、壁纸、大赛
7 666 年货购买者 年货、大礼包、销售额、网页链接、大礼盒
8 312 抽奖热衷群体2 电影票、游戏、萌杯、零嘴
9 343 女生优惠群体 抱枕、优惠券、聚划算、女王、女生节
10 285 学生群体 焕新季、开学礼、礼包
11 236 有家人群 吃货、全家桶、味觉、妈妈、兑换码
12 731 双十一消费者 双11、天猫、光棍节、购物车、淘口令
13 330 求职人群 交流会、招聘、体验师
14 272 抽奖热衷群体3 U 盘、梦想 、广告片、小米手机
15 284 员工群体 年终奖、红包、创始人、团队、春节
User Category and User Group Characteristics
[1] Dao W V T, Angelina N H L, Cheng J M S, et al. Social Media Advertising Value: The Case of Transitional Economies in Southeast Asia[J]. International Journal of Advertising, 2014,33(2):271-294.
[2] Mago N, Shirwaikar R D, Acharya U D, et al. Partition and Hierarchical Based Clustering Techniques for Analysis of Neonatal Data[C]// Proceedings of International Conference on Cognition and Recognition. 2017: 345-355.
[3] Zhang S C, Yu J. A New Connectivity-based Cluster Validity Index[C]// Proceedings of 2010 Chinese Conference on Pattern Recognition (CCPR). 2010.
[4] Yamaguchi Y, Amagasa T, Kitagawa H. Tag-based User Topic Discovery Using Twitter Lists[C]// Proceedings of 2011 International Conference on Advances in Social Networks Analysis and Mining. 2011: 13-20.
[5] Wu W, Zhang B, Ostendorf M. Automatic Generation of Personalized Annotation Tags for Twitter Users[C]// Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association of Computational Linguistics. 2010: 689-692.
[6] 王艳茹, 马慧芳, 刘海姣, 等. 基于多标签语义关联关系的微博用户兴趣建模方法[J]. 计算机工程与科学, 2018,40(11):2067-2073.
[6] ( Wang Yanru, Ma Huifang, Liu Haijiao, et al. A Microblog User Interest Modeling Method Based on Multi-tag Semantic Correlation[J]. Computer Engineering & Science, 2018,40(11):2067-2073.)
[7] 熊回香, 叶佳鑫. 一种双层的微博用户相似度算法[J]. 情报杂志, 2018,37(6):160-166.
[7] ( Xiong Huixiang, Ye Jiaxin. A Double-level Microblogs User Similarity Algorithm[J]. Journal of Intelligence, 2018,37(6):160-166.)
[8] Wallner G, Kriglstein S, Drachen A. Tweeting Your Destiny: Profiling Users in the Twitter Landscape around an Online Game[OL]. arXiv Preprint, arXiv: 1905.12694.
[9] 李鹏飞, 董旭, 仲兆满, 等. 基于微博用户兴趣话题的相似用户挖掘[J]. 计算机工程与应用, 2019,55(11):102-109.
[9] ( Li Pengfei, Dong Xu, Zhong Zhaoman, et al. Similar User Mining Based on User Interest Topics in Weibo[J]. Computer Engineering and Applications, 2019,55(11):102-109.)
[10] Sohail A, Cheema M A, Taniar D. Geo-social Temporal Top-k Queries in Location-based Social Networks[A]//Databases Theory and Applications[M]. Springer, 2020: 147-160.
[11] Wan L, Hong Y M, Huang Z, et al. A Hybrid Ensemble Learning Method for Tourist Route Recommendations Based on Geo-tagged Social Networks[J]. International Journal of Geographical Information Science, 2018,32(11):2225-2246.
[12] 余帝乾. 一种微博用户行为分析预测的方法:中国,CN201711078084.0[P]. 2018-04-13. [2018-04-13].
[12] ( Yu Diqian. Micro-blog User Behavior Analyzing and Forecasting Method:China,CN201711078084.0[P]. 2018-04-13. [2018-04-13].
[13] Ma H F, Jia M H Z, Zhang D, et al. Combining Tag Correlation and User Social Relation for Microblog Recommendation[J]. Information Sciences, 2017,385(C):325-337.
[14] 万子玮. 基于主题词的微博用户兴趣模型研究[D]. 北京:首都经济贸易大学, 2018.
[14] ( Wan Ziwei. Research on Weibo User Interest Model Based on Topic Words[D]. Beijing: Capital University of Economics and Business, 2018.)
[15] Sheffi Y. Urban Transportation Networks: Equi-librium Analysis with Mathematical Programming Methods[M]. Printice-Hall, 1985.
[16] Nagurney A, Cruz J, Dong J, et al. Supply Chain Networks, Electronic Commerce, and Supply Side and Demand Side Risk[J]. European Journal of Operational Research, 2005,164(1):120-142.
[17] 王寿彪, 李新明, 刘东. 基于粒计算的武器装备体系结构超网络模型[J]. 系统工程与电子技术, 2016,38(4):836-843.
[17] ( Wang Shoubiao, Li Xinming, Liu Dong. Super-network Model of Architecture for Weapon Equipment System of Systems Based on Granular Computing[J]. Journal of Systems Engineering and Electronics, 2016,38(4):836-843.)
[18] 胡弥亨. 基于超图理论的物联网实体关系网络建模[J]. 电脑知识与技术, 2018,14(5):41-43.
[18] ( Hu Miheng. Modeling of Entity Relationship Network in the Internet of Things Based on Hypergraph Theory[J]. Computer Knowledge and Technology, 2018,14(5):41-43.)
[19] Shang Y C, Wang H S, Wang Y L. The Supernetwork Model of Social Networking Services[J]. Journal of Donghua University(English Edition), 2012,29(1):37-39.
[20] Lian Y, Dong X F, Chi Y X, et al. An Internet Water Army Detection Supernetwork Model[J]. IEEE Access, 2019,7:55108-55120.
[21] Chi Y X, Tang X Y, Lian Y, et al. A Supernetwork-based Online Post Informative Quality Evaluation Model[J]. Knowledge-based Systems, 2019,168:10-24.
[22] 王丹, 张海涛, 刘雅姝, 等. 微博舆情关键节点情感倾向分析及思想引领研究[J]. 图书情报工作, 2019,63(4):15-22.
[22] ( Wang Dan, Zhang Haitao, Liu Yashu, et al. Sentiment Analysis and Ideological Guidance of Key Nodes in Micro-blog Public Opinion[J]. Library and Information Service, 2019,63(4):15-22.)
[23] 姬逸潇, 吴晨思, 杨粟, 等. 基于超网络的网络安全事件连锁演化模型[J]. 信息安全学报, 2019,4(1):89-100.
[23] ( Ji Yixiao, Wu Chensi, Yang Su, et al. Network Security Event Chain Evolution Model Based on Super Network[J]. Journal of Cyber Security, 2019,4(1):89-100.)
[24] Nguyen M D, Shin W Y. DBSTexC: Density-based Spatio-textual Clustering on Twitter[C]// Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2017: 23-26.
[25] 郑杰辉. 基于聚类挖掘算法的微博用户兴趣发现的实现[J]. 网络安全技术与应用, 2017(10):48-49, 56.
[25] ( Zheng Jiehui. Implementation of Microblog User Interest Discovery Based on Clustering Mining Algorithm[J]. Network Security Technology & Application, 2017(10):48-49, 56.)
[26] Shi J B, Malik J. Normalized Cuts and Image Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000,22(8) : 888-905.
[27] 徐洪元. 社会媒体群组探测的谱聚类研究与应用[D]. 武汉: 武汉理工大学, 2016.
[27] ( Xu Hongyuan. Spectral Clustering Research and Application on Community Detection of Social Media[D]. Wuhan: Wuhan University of Technology, 2016.)
[28] Tran C, Kim J Y, Shin W Y, et al. Clustering-based Collaborative Filtering Using an Incentivized/Penalized User Model[J]. IEEE Access, 2019,7:62115-62125.
[29] Zhang S X, Zhang S Y, Yen N Y, et al. The Recommendation System of Micro-blog Topic Based on User Clustering[J]. Mobile Networks and Applications, 2017,22(2):228-239.
[30] 熊回香, 蒋武轩. 基于标签与关系网络的用户聚类推荐研究[J]. 数据分析与知识发现, 2017,1(6):36-46.
[30] ( Xiong Huixiang, Jiang Wuxuan. Clustering and Recommending Users Based on Tags and Relation Network[J]. Data Analysis and Knowledge Discovery, 2017,1(6):36-46.)
[31] 廖晓, 叶广宇, 李伟婵, 等. 基于内容与行为数据集成建模的企业微博粉丝兴趣挖掘方法[J]. 系统工程, 2019,37(2):139-149.
[31] ( Liao Xiao, Ye Guangyu, Li Weichan, et al. The Methods to Mine Fans Interests of Enterprise Micro-blog Based on the Integration of Text and Behavior Data[J]. Systems Engineering, 2019,37(2):139-149.)
[32] Von Luxburg U. A Tutorial on Spectral Clustering[J]. Statistics and Computing, 2007,17(4):395-416.
[33] Kardaras D K, Kaperonis S, Barbounaki S, et al. An Approach to Modelling User Interests Using TF-IDF and Fuzzy Sets Qualitative Comparative Analysis[C]// Proceedings of IFIP International Conference on Artificial Intelligence Applications and Innovations. 2018: 606-615.
[34] Wang W J, Xu Z B, Lu W Z, et al. Determination of the Spread Parameter in the Gaussian Kernel for Classification and Regression[J]. Neurocomputing, 2003,55(3/4):643-663.
doi: 10.1016/S0925-2312(02)00632-X
[35] 安兴茹. 基于正态分布的词频分析法高频词阈值研究[J]. 情报杂志, 2014,33(10):129-136.
[35] ( An Xingru. The Research on the Threshold of High-frequency Words Based on the Normal Distribution in Word Frequency Analysis[J]. Journal of Intelligence, 2014,33(10):129-136.)
[1] Li Xiangdong,Gao Fan,Li Youhai. Categorizing Documents Automatically within Common Semantic Space[J]. 数据分析与知识发现, 2018, 2(9): 66-73.
[2] Chen Meimei,Xue Kangjie. Personalized Recommendation Algorithm Based on Modified Tensor Decomposition Model[J]. 数据分析与知识发现, 2017, 1(3): 38-45.
[3] Xie Mengyao,Pan Xuwei. Constructing Dynamic Social Tag Cloud for User Interests[J]. 数据分析与知识发现, 2017, 1(2): 35-40.
[4] Zhang Zhiwu. Sentiment Analysis of Product Reviews by means of Cross-domain Transfer Learning[J]. 现代图书情报技术, 2013, (6): 49-54.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn