Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (8): 107-118     https://doi.org/10.11925/infotech.2096-3467.2020.0091
     研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于超网络的企业微博用户聚类研究及特征分析*
席运江1,杜蝶蝶1,廖晓2(),仉学红1
1华南理工大学工商管理学院 广州 510641
2广东金融学院互联网金融与信息工程学院 广州 510521
Analyzing & Clustering Enterprise Microblog Users with Supernetwork
Xi Yunjiang1,Du Diedie1,Liao Xiao2(),Zhang Xuehong1
1School of Business Administration, South China University of Technology, Guangzhou 510641, China
2School of Internet Finance and Information Engineering, Guangdong University of Finance,Guangzhou 510521, China
全文: PDF (1802 KB)   HTML ( 4
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】提出一种对多维用户兴趣数据的集成建模方法,并在此基础上研究用户兴趣的谱聚类方法。【方法】以"三只松鼠"微博数据为实例,采用超网络模型对微博内容及用户互动数据进行整合建模,构建互动兴趣度指数,并结合谱聚类算法划分用户群。通过Silhouette Coefficient及Davies-Bouldin方法对实验结果进行评估。【结果】对比三类用户特征向量的最优聚类效果,发现当k取15时,基于话题互动超网络特征向量的聚类DB值达到0.57,效果优于基于互动数据或博文内容的特征向量,类群之间分布更均匀,类群内部也更紧致。【局限】用户特征数据的选取未能全面涵盖。此外,不同维度数据对用户兴趣的影响程度或可进一步探索。【结论】通过对企业微博用户群体分布情况和兴趣特征的分析,提出对应的维护和营销建议,有助于指导企业更好地发现用户兴趣,提升微博营销效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
席运江
杜蝶蝶
廖晓
仉学红
关键词 超网络企业微博用户兴趣谱聚类    
Abstract

[Objective] This paper proposes an integrated modeling method to process multi-dimensional user interest data, aiming to examine the spectral clustering method for analyzing user interests. [Methods] First, we retrieved Weibo (Microblog) data of "Three Squirrels" and used supernetwork model to integrate the modeling of contents and user interaction data. Then, we constructed an interactive interest index and grouped the users with spectral clustering algorithm. Finally, we evaluated the clustering results with the Silhouette Coefficient and Davies-Bouldin methods. [Results] We found that the clustering DB value reached 0.57 (k was set at 15), which was evenly distributed. [Limitations] More research is needed to further explore user characteristic data and the impacts of different data dimensions on user interests. [Conclusions] This study proposes maintenance and marketing suggestions for enterprise Weibo profiles, which will help them identify user interests and improve marketing effectiveness.

Key wordsSupernetwork    Enterprise Microblog    User Interests    Spectral Clustering
收稿日期: 2020-02-10      出版日期: 2020-09-14
ZTFLH:  G206  
基金资助:*本文系国家自然科学基金项目"基于超网络的企业微博知识挖掘及整合方法研究"的研究成果之一(71371077)
通讯作者: 廖晓     E-mail: 1448362251@qq.com
引用本文:   
席运江, 杜蝶蝶, 廖晓, 仉学红. 基于超网络的企业微博用户聚类研究及特征分析*[J]. 数据分析与知识发现, 2020, 4(8): 107-118.
Xi Yunjiang, Du Diedie, Liao Xiao, Zhang Xuehong. Analyzing & Clustering Enterprise Microblog Users with Supernetwork. Data Analysis and Knowledge Discovery, 2020, 4(8): 107-118.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0091      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I8/107
Fig.1  话题-关键词网络(部分)
Fig.2  粉丝用户-话题网络(部分)
Fig.3  EMTIS超网络
Fig.4  基于EMTIS的谱聚类算法流程
话题1 话题2
关键词 权重 关键词 权重
双12 0.824 467 合照 0.629 198
福利 0.559 995 自拍 0.336 896
吃土 0.412 233 大咖秀 0.314 599
剁手党 0.412 233 剪刀手 0.314 599
吃货 0.412 233 双十一 0.314 599
Table 1  特征词抽取示例(权重Top5的关键词)
用户ID 特征词
2642129313 抱枕、旅游、零食、写真、网页链接、抽奖、玩偶、果干
1042447931 福利、周末、转发、礼包、坚果手机、回家、焕新季
5026461834 游戏、新品、试吃、云果园、果干、猜中、萌宠
Table 2  用户及对应特征词(选取部分示例)
排名 核心词 频次 排名 核心词 频次
1 主人 478 6 零食 95
2 转发 220 7 年货 75
3 吃货 162 8 投票 71
4 网页链接 152 9 回家 71
5 坚果 151 10 福利 66
Table 3  核心特征词统计(Top10)
排名 用户ID 参与话题数 排名 用户ID 参与话题数
1 2238363480 362 6 5497626858 154
2 2389941761 264 7 2834492565 150
3 1939554543 257 8 2267365535 149
4 5348522194 185 9 1712477690 141
5 5591452414 185 10 5208353983 122
Table 4  用户参与话题统计(Top10)
Fig.5  Silhouette Coefficient聚类评价
Fig.6  Davies-Bouldin聚类评价
类团 人数 粉群名称 主要关键词
1 336 旅游爱好者 零食包、神器、处女座、美照、旅行
2 391 宅男宅女 福利、松鼠君、主页菌、周末
3 788 单身狗与情侣 七夕、单身、基友、头像、公仔
4 412 抽奖热衷群体1 坚果手机、实力派、开奖、新技能、潮礼
5 215 新品关注者 云果园、新品、果干、链接
6 389 周边爱好者 头像、漫画、涂鸦、壁纸、大赛
7 666 年货购买者 年货、大礼包、销售额、网页链接、大礼盒
8 312 抽奖热衷群体2 电影票、游戏、萌杯、零嘴
9 343 女生优惠群体 抱枕、优惠券、聚划算、女王、女生节
10 285 学生群体 焕新季、开学礼、礼包
11 236 有家人群 吃货、全家桶、味觉、妈妈、兑换码
12 731 双十一消费者 双11、天猫、光棍节、购物车、淘口令
13 330 求职人群 交流会、招聘、体验师
14 272 抽奖热衷群体3 U 盘、梦想 、广告片、小米手机
15 284 员工群体 年终奖、红包、创始人、团队、春节
Table 5  用户类别与用户群体特征表
[1] Dao W V T, Angelina N H L, Cheng J M S, et al. Social Media Advertising Value: The Case of Transitional Economies in Southeast Asia[J]. International Journal of Advertising, 2014,33(2):271-294.
[2] Mago N, Shirwaikar R D, Acharya U D, et al. Partition and Hierarchical Based Clustering Techniques for Analysis of Neonatal Data[C]// Proceedings of International Conference on Cognition and Recognition. 2017: 345-355.
[3] Zhang S C, Yu J. A New Connectivity-based Cluster Validity Index[C]// Proceedings of 2010 Chinese Conference on Pattern Recognition (CCPR). 2010.
[4] Yamaguchi Y, Amagasa T, Kitagawa H. Tag-based User Topic Discovery Using Twitter Lists[C]// Proceedings of 2011 International Conference on Advances in Social Networks Analysis and Mining. 2011: 13-20.
[5] Wu W, Zhang B, Ostendorf M. Automatic Generation of Personalized Annotation Tags for Twitter Users[C]// Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association of Computational Linguistics. 2010: 689-692.
[6] 王艳茹, 马慧芳, 刘海姣, 等. 基于多标签语义关联关系的微博用户兴趣建模方法[J]. 计算机工程与科学, 2018,40(11):2067-2073.
[6] ( Wang Yanru, Ma Huifang, Liu Haijiao, et al. A Microblog User Interest Modeling Method Based on Multi-tag Semantic Correlation[J]. Computer Engineering & Science, 2018,40(11):2067-2073.)
[7] 熊回香, 叶佳鑫. 一种双层的微博用户相似度算法[J]. 情报杂志, 2018,37(6):160-166.
[7] ( Xiong Huixiang, Ye Jiaxin. A Double-level Microblogs User Similarity Algorithm[J]. Journal of Intelligence, 2018,37(6):160-166.)
[8] Wallner G, Kriglstein S, Drachen A. Tweeting Your Destiny: Profiling Users in the Twitter Landscape around an Online Game[OL]. arXiv Preprint, arXiv: 1905.12694.
[9] 李鹏飞, 董旭, 仲兆满, 等. 基于微博用户兴趣话题的相似用户挖掘[J]. 计算机工程与应用, 2019,55(11):102-109.
[9] ( Li Pengfei, Dong Xu, Zhong Zhaoman, et al. Similar User Mining Based on User Interest Topics in Weibo[J]. Computer Engineering and Applications, 2019,55(11):102-109.)
[10] Sohail A, Cheema M A, Taniar D. Geo-social Temporal Top-k Queries in Location-based Social Networks[A]//Databases Theory and Applications[M]. Springer, 2020: 147-160.
[11] Wan L, Hong Y M, Huang Z, et al. A Hybrid Ensemble Learning Method for Tourist Route Recommendations Based on Geo-tagged Social Networks[J]. International Journal of Geographical Information Science, 2018,32(11):2225-2246.
[12] 余帝乾. 一种微博用户行为分析预测的方法:中国,CN201711078084.0[P]. 2018-04-13. [2018-04-13].
[12] ( Yu Diqian. Micro-blog User Behavior Analyzing and Forecasting Method:China,CN201711078084.0[P]. 2018-04-13. [2018-04-13].
[13] Ma H F, Jia M H Z, Zhang D, et al. Combining Tag Correlation and User Social Relation for Microblog Recommendation[J]. Information Sciences, 2017,385(C):325-337.
[14] 万子玮. 基于主题词的微博用户兴趣模型研究[D]. 北京:首都经济贸易大学, 2018.
[14] ( Wan Ziwei. Research on Weibo User Interest Model Based on Topic Words[D]. Beijing: Capital University of Economics and Business, 2018.)
[15] Sheffi Y. Urban Transportation Networks: Equi-librium Analysis with Mathematical Programming Methods[M]. Printice-Hall, 1985.
[16] Nagurney A, Cruz J, Dong J, et al. Supply Chain Networks, Electronic Commerce, and Supply Side and Demand Side Risk[J]. European Journal of Operational Research, 2005,164(1):120-142.
[17] 王寿彪, 李新明, 刘东. 基于粒计算的武器装备体系结构超网络模型[J]. 系统工程与电子技术, 2016,38(4):836-843.
[17] ( Wang Shoubiao, Li Xinming, Liu Dong. Super-network Model of Architecture for Weapon Equipment System of Systems Based on Granular Computing[J]. Journal of Systems Engineering and Electronics, 2016,38(4):836-843.)
[18] 胡弥亨. 基于超图理论的物联网实体关系网络建模[J]. 电脑知识与技术, 2018,14(5):41-43.
[18] ( Hu Miheng. Modeling of Entity Relationship Network in the Internet of Things Based on Hypergraph Theory[J]. Computer Knowledge and Technology, 2018,14(5):41-43.)
[19] Shang Y C, Wang H S, Wang Y L. The Supernetwork Model of Social Networking Services[J]. Journal of Donghua University(English Edition), 2012,29(1):37-39.
[20] Lian Y, Dong X F, Chi Y X, et al. An Internet Water Army Detection Supernetwork Model[J]. IEEE Access, 2019,7:55108-55120.
[21] Chi Y X, Tang X Y, Lian Y, et al. A Supernetwork-based Online Post Informative Quality Evaluation Model[J]. Knowledge-based Systems, 2019,168:10-24.
[22] 王丹, 张海涛, 刘雅姝, 等. 微博舆情关键节点情感倾向分析及思想引领研究[J]. 图书情报工作, 2019,63(4):15-22.
[22] ( Wang Dan, Zhang Haitao, Liu Yashu, et al. Sentiment Analysis and Ideological Guidance of Key Nodes in Micro-blog Public Opinion[J]. Library and Information Service, 2019,63(4):15-22.)
[23] 姬逸潇, 吴晨思, 杨粟, 等. 基于超网络的网络安全事件连锁演化模型[J]. 信息安全学报, 2019,4(1):89-100.
[23] ( Ji Yixiao, Wu Chensi, Yang Su, et al. Network Security Event Chain Evolution Model Based on Super Network[J]. Journal of Cyber Security, 2019,4(1):89-100.)
[24] Nguyen M D, Shin W Y. DBSTexC: Density-based Spatio-textual Clustering on Twitter[C]// Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2017: 23-26.
[25] 郑杰辉. 基于聚类挖掘算法的微博用户兴趣发现的实现[J]. 网络安全技术与应用, 2017(10):48-49, 56.
[25] ( Zheng Jiehui. Implementation of Microblog User Interest Discovery Based on Clustering Mining Algorithm[J]. Network Security Technology & Application, 2017(10):48-49, 56.)
[26] Shi J B, Malik J. Normalized Cuts and Image Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000,22(8) : 888-905.
[27] 徐洪元. 社会媒体群组探测的谱聚类研究与应用[D]. 武汉: 武汉理工大学, 2016.
[27] ( Xu Hongyuan. Spectral Clustering Research and Application on Community Detection of Social Media[D]. Wuhan: Wuhan University of Technology, 2016.)
[28] Tran C, Kim J Y, Shin W Y, et al. Clustering-based Collaborative Filtering Using an Incentivized/Penalized User Model[J]. IEEE Access, 2019,7:62115-62125.
[29] Zhang S X, Zhang S Y, Yen N Y, et al. The Recommendation System of Micro-blog Topic Based on User Clustering[J]. Mobile Networks and Applications, 2017,22(2):228-239.
[30] 熊回香, 蒋武轩. 基于标签与关系网络的用户聚类推荐研究[J]. 数据分析与知识发现, 2017,1(6):36-46.
[30] ( Xiong Huixiang, Jiang Wuxuan. Clustering and Recommending Users Based on Tags and Relation Network[J]. Data Analysis and Knowledge Discovery, 2017,1(6):36-46.)
[31] 廖晓, 叶广宇, 李伟婵, 等. 基于内容与行为数据集成建模的企业微博粉丝兴趣挖掘方法[J]. 系统工程, 2019,37(2):139-149.
[31] ( Liao Xiao, Ye Guangyu, Li Weichan, et al. The Methods to Mine Fans Interests of Enterprise Micro-blog Based on the Integration of Text and Behavior Data[J]. Systems Engineering, 2019,37(2):139-149.)
[32] Von Luxburg U. A Tutorial on Spectral Clustering[J]. Statistics and Computing, 2007,17(4):395-416.
[33] Kardaras D K, Kaperonis S, Barbounaki S, et al. An Approach to Modelling User Interests Using TF-IDF and Fuzzy Sets Qualitative Comparative Analysis[C]// Proceedings of IFIP International Conference on Artificial Intelligence Applications and Innovations. 2018: 606-615.
[34] Wang W J, Xu Z B, Lu W Z, et al. Determination of the Spread Parameter in the Gaussian Kernel for Classification and Regression[J]. Neurocomputing, 2003,55(3/4):643-663.
doi: 10.1016/S0925-2312(02)00632-X
[35] 安兴茹. 基于正态分布的词频分析法高频词阈值研究[J]. 情报杂志, 2014,33(10):129-136.
[35] ( An Xingru. The Research on the Threshold of High-frequency Words Based on the Normal Distribution in Word Frequency Analysis[J]. Journal of Intelligence, 2014,33(10):129-136.)
[1] 蔡永明,刘璐,王科唯. 网络虚拟学习社区重要用户与核心主题联合分析*[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
[2] 夏立新,曾杰妍,毕崇武,叶光辉. 基于LDA主题模型的用户兴趣层级演化研究 *[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[3] 聂卉. 结合词向量和词图算法的用户兴趣建模研究 *[J]. 数据分析与知识发现, 2019, 3(12): 30-40.
[4] 许鹏程,毕强. 基于知识超网络的领域专家识别研究[J]. 数据分析与知识发现, 2019, 3(11): 89-98.
[5] 李湘东, 高凡, 李悠海. 共通语义空间下的跨文献类型文本自动分类研究*[J]. 数据分析与知识发现, 2018, 2(9): 66-73.
[6] 吴江, 贺超城, 马磐昊. 基于迭代超中心度的MOOC论坛用户知识互动超网络研究*[J]. 数据分析与知识发现, 2017, 1(8): 1-8.
[7] 曾金, 陆伟, 丁恒, 陈海华. 基于图像语义的用户兴趣建模*[J]. 数据分析与知识发现, 2017, 1(4): 76-83.
[8] 陈梅梅, 薛康杰. 基于改进张量分解模型的个性化推荐算法研究*[J]. 数据分析与知识发现, 2017, 1(3): 38-45.
[9] 谢梦瑶, 潘旭伟. 社会化标注中用户动态标签云构建研究*[J]. 数据分析与知识发现, 2017, 1(2): 35-40.
[10] 张磊,马静,李丹丹,沈洋. 语义社会网络的超网络模型构建及关键节点自动化识别方法研究*[J]. 现代图书情报技术, 2016, 32(3): 8-17.
[11] 毕强, 刘健, 鲍玉来. 基于语义相似度的文本聚类研究*[J]. 数据分析与知识发现, 2016, 32(12): 9-16.
[12] 桂思思, 陆伟, 黄诗豪, 周鹏程. 融合主题模型及多时间节点函数的用户兴趣预测研究[J]. 现代图书情报技术, 2015, 31(9): 9-16.
[13] 刘占兵, 肖诗斌. 基于用户兴趣模糊聚类的协同过滤算法[J]. 现代图书情报技术, 2015, 31(11): 12-17.
[14] 张志武. 跨领域迁移学习产品评论情感分析[J]. 现代图书情报技术, 2013, (6): 49-54.
[15] 赵捧未, 马琳, 秦春秀. P2P用户兴趣社区形成研究[J]. 现代图书情报技术, 2013, 29(10): 53-58.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn