Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (12): 99-112    DOI: 10.11925/infotech.2096-3467.2022.0127
Current Issue | Archive | Adv Search |
Identifying Untrusted Weibo Users Based on Improved Dempster-Shafer Evidence Theory
Xu Jianmin1,Wang Kailin1,Wu Shufang2()
1College of Cyberspace Security and Computer, Hebei University, Baoding 071002, China
2College of Management, Hebei University, Baoding 071002, China
Download: PDF (1641 KB)   HTML ( 18
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper modifies the Dempster-Shafer evidence theory, aiming to identify untrusted Sina Weibo (Microblog) users with subjective uncertainties. [Methods] Firstly, we used the evidence distance to improve the original Dempster-Shafer evidence theory. Then, we transformed the credibility of historical posts into evidence, which was also merged to generate users’ trust interval. Finally, we identified untrusted users with the Decision Tree algorithm and the trust interval. [Results] Compared with the existing methods, our new model reduced the processing time by 287.4 seconds, increased the F 1 value by 31.9 percentage point, and received an optimal Chi-Square value of the consistency test. [Limitations] We only investigated the subjective uncertainties due to time decay and evidence conflict, and need to add the impacts of cognitive differences on subjective degrees. [Conclusions] The proposed method could effectively identify untrusted users from Sina Weibo.

Key wordsMicroblog      Untrusted Users      Subjective Uncertainty      Dempster-Shafer Evidence Theory     
Received: 17 February 2022      Published: 03 February 2023
ZTFLH:  G203  
  TP182  
Fund:National Social Science Fund of China(17BTQ068);Key Projects of Humanities and Social Sciences in Hebei Province(ZD202102)
Corresponding Authors: Wu Shufang,ORCID:0000-0002-9885-6944     E-mail: shufang_44@126.com

Cite this article:

Xu Jianmin, Wang Kailin, Wu Shufang. Identifying Untrusted Weibo Users Based on Improved Dempster-Shafer Evidence Theory. Data Analysis and Knowledge Discovery, 2022, 6(12): 99-112.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0127     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I12/99

A Framework for Untrusted Users Identification on Microblog Based on Improved Dempster-Shafer Theory of Evidence
Example of Untrusted Posts
A Fragment of the Emoji Mapping Table
Example of Emoji Conversion
A 0 = { θ 0 } A 1 = { θ 1 } A 2 = { θ 0 , θ 1 } A 3 = ?
m1 1.0 0.0 0.0 0.0
m2 0.0 1.0 0.0 0.0
Mideal 0.0 0.0 1.0 0.0
M 不存在 不存在 不存在 0.0
Example of Conflict Evidence Integration
命题Aj 含义
A 0 ={用户可信} 微博用户可信
A 1 ={用户不可信} 微博用户不可信
A 2 ={用户可信,用户不可信} 无法判断微博用户的可信性
A 3 = ? 未进行可信性判断
Propositions on Frame Θ
Sample Distribution of Untrusted Users
识别方法 介绍
DS-DT 本文提出的基于改进D-S证据理论的不可信用户识别方法
E-N[4] 利用最小熵离散化和朴素贝叶斯算法实现虚假用户识别
DDTLS[16] 利用双层采样主动学习方法辅助实现虚假用户检测
Truser[11] 利用两阶段ISODATA聚类实现不可信用户挖掘
A-D[15] 基于情感倾向挖掘恶意煽动激进情绪的敏感节点
Introduction to Comparison Methods
信用
极低
信用
较低
信用
一般
信用
较好
信用
极好
总计
识别不可信数 u 0 u 1 u 2 u 3 u 4 u
实际不可信数 v 0 v 1 v 2 v 3 v 4 v
总计 u 0 + v 0 u 1 + v 1 u 2 + v 2 u 3 + v 3 u 4 + v 4 u + v
The Contingency Table About Credit Ratings and Credibility of Users
识别方法 测试集1 测试集2 测试集3 测试集4 测试集5
DS-DT 422.7 407.3 389.8 428.5 411.3
E-N 453.5 431.0 428.6 447.1 426.1
DDTLS 687.7 673.1 677.2 703.8 692.0
Truser 653.9 650.4 632.3 677.4 655.0
A-D 604.4 581.7 573.0 611.8 596.1
Time Consumption on 5 Test Sets
F1-Measure on Fraudulent, Pornographic, Insulting, Script-based Users
Recall on Fraudulent, Pornographic, Insulting, Script-based Users
Precision on Fraudulent, Pornographic, Insulting, Script-based Users
识别方法 F 1 召回率 精确率
DS-DT 0.812 0.738 0.902
E-N 0.679 0.665 0.693
DDTLS 0.804 0.716 0.917
Truser 0.608 0.531 0.711
A-D 0.493 0.408 0.622
F1-measure, Recall and Precision on Total Users
方法 χ 2
DS-DT 533.65
E-N 688.15
DDTLS 563.46
Truser 579.21
A-D 756.87
Chi-Square Value
[1] 中华人民共和国国家互联网信息办公室. 网络信息内容生态治理规定[EB/OL]. [2022-10-31]. http://www.cac.gov.cn/2019-12/20/c_1578375159509309.htm.
[1] (Cyberspace Administration of China. Regulations on Ecological Governance of Network Information Content[EB/OL]. [2022-10-31]. http://www.cac.gov.cn/2019-12/20/c_1578375159509309.htm. )
[2] Yu Z D, Yu H Q. Untrusted User Detection in Microblogs[C]// Proceedings of the 13th International Conference on Trust, Security and Privacy in Computing and Communications. IEEE, 2014: 558-564.
[3] Dempster A P. Upper and Lower Probabilities Induced by a Multivalued Mapping[J]. The Annals of Mathematical Statistics, 1967, 38(2): 325-339.
doi: 10.1214/aoms/1177698950
[4] Erşahin B, Aktaş Ö, Kılınç D, et al. Twitter Fake Account Detection[C]// Proceedings of the 2017 International Conference on Computer Science and Engineering(UBMK). IEEE, 2017: 388-392.
[5] Wu Y H, Fang Y Z, Shang S K, et al. A Novel Framework for Detecting Social Bots with Deep Neural Networks and Active Learning[J]. Knowledge-Based Systems, 2021, 211: 106525.
doi: 10.1016/j.knosys.2020.106525
[6] 梁晓贺, 田儒雅, 吴蕾, 等. 基于超网络的微博相似度及其在微博舆情主题发现中的应用[J]. 图书情报工作, 2020, 64(11): 77-86.
doi: 10.13266/j.issn.0252-3116.2020.11.009
[6] (Liang Xiaohe, Tian Ruya, Wu Lei, et al. Microblog Similarity Based on Super Network and Its Application in Microblog Public Opinion Topic Detection[J]. Library and Information Service, 2020, 64(11): 77-86.)
doi: 10.13266/j.issn.0252-3116.2020.11.009
[7] Mccord M, Chuah M. Spam Detection on Twitter Using Traditional Classifiers[C]// Proceedings of the 8th International Conference on Autonomic and Trusted Computing. Springer, 2011: 175-186.
[8] 陈慧敏, 金思辰, 林微, 等. 新冠疫情相关社交媒体谣言传播量化分析[J]. 计算机研究与发展, 2021, 58(7): 1366-1384.
[8] (Chen Huimin, Jin Sichen, Lin Wei, et al. Quantitative Analysis on the Communication of COVID-19 Related Social Media Rumors[J]. Journal of Computer Research and Development, 2021, 58(7): 1366-1384.)
[9] Jr Barbon S, Campos G F C, Tavares G M, et al. Detection of Human, Legitimate Bot, and Malicious Bot in Online Social Networks Based on Wavelets[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2018, 14(1s): Article No.26.
[10] 贾俊杰, 段超强. 基于评分离散度的托攻击检测算法[J]. 计算机工程与科学, 2022, 44(3): 554-562.
[10] (Jia Junjie, Duan Chaoqiang. A Shilling Attack Detection Algorithm Based on Score Dispersion[J]. Computer Engineering & Science, 2022, 44(3): 554-562.)
[11] 何鹏, 吴浩, 曾诚, 等. Truser: 一种基于可信用户的服务推荐方法[J]. 计算机学报, 2019, 42(4): 851-863.
[11] (He Peng, Wu Hao, Zeng Cheng, et al. Truser: An Approach to Service Recommendation Based on Trusted Users[J]. Chinese Journal of Computers, 2019, 42(4): 851-863.)
[12] Alsmadi I, O’rien M J. How Many Bots in Russian Troll Tweets?[J]. Information Processing & Management, 2020, 57(6): 102303.
doi: 10.1016/j.ipm.2020.102303
[13] Gupta A, Lamba H, Kumaraguru P. $1.00 per RT #BostonMarathon #PrayForBoston:Analyzing Fake Content on Twitter[C]// Proceedings of the 2013 APWG eCrime Researchers Summit. IEEE, 2013: 1-12.
[14] Kagan D M, Elovichi Y, Fire M. Generic Anomalous Vertices Detection Utilizing a Link Prediction Algorithm[J]. Social Network Analysis and Mining, 2018, 8(1): 1-13.
doi: 10.1007/s13278-017-0479-5
[15] 王丹, 张海涛, 刘雅姝, 等. 微博舆情关键节点情感倾向分析及思想引领研究[J]. 图书情报工作, 2019, 63(4): 15-22.
doi: 10.13266/j.issn.0252-3116.2019.04.002
[15] (Wang Dan, Zhang Haitao, Liu Yashu, et al. Sentiment Analysis and Ideological Guidance of Key Nodes in Micro-Blog Public Opinion[J]. Library and Information Service, 2019, 63(4): 15-22.)
doi: 10.13266/j.issn.0252-3116.2019.04.002
[16] 谭侃, 高旻, 李文涛, 等. 基于双层采样主动学习的社交网络虚假用户检测方法[J]. 自动化学报, 2017, 43(3): 448-461.
[16] (Tan Kan, Gao Min, Li Wentao, et al. Two-Layer Sampling Active Learning Algorithm for Social Spammer Detection[J]. Acta Automatica Sinica, 2017, 43(3): 448-461.)
[17] Shafer G A. A Mathematical Theory of Evidence[J]. Technometrics, 1978, 20(1): 106.
[18] Zadeh L A. A Simple View of the Dempster-Shafer Theory of Evidence and Its Implication for the Rule of Combination[J]. AI Magazine, 1986, 7(2):85-90.
[19] Murphy C K. Combining Belief Functions When Evidence Conflicts[J]. Decision Support Systems, 2000, 29(1): 1-9.
doi: 10.1016/S0167-9236(99)00084-6
[20] Yager R R. On the Dempster-Shafer Framework and New Combination Rules[J]. Information Sciences, 1987, 41(2): 93-137.
doi: 10.1016/0020-0255(87)90007-7
[21] 徐鹏, 林森. 基于C4.5决策树的流量分类方法[J]. 软件学报, 2009, 20(10): 2692-2704.
doi: 10.3724/SP.J.1001.2009.03444
[21] (Xu Peng, Lin Sen. Internet Traffic Classification Using C4.5 Decision Tree[J]. Journal of Software, 2009, 20(10): 2692-2704.)
doi: 10.3724/SP.J.1001.2009.03444
[22] 沈旺, 代旺, 高雪倩, 等. 基于多重图的社交网络用户可信度评价方法研究——网络欺凌与隐私泄露视角[J]. 现代情报, 2020, 40(8): 27-37.
doi: 10.3969/j.issn.1008-0821.2020.08.004
[22] (Shen Wang, Dai Wang, Gao Xueqian, et al. Research on Credibility Evaluation Method of Social Network Users Based on Multigraph—Perspective on Cyberbullying and Privacy Disclosure[J]. Journal of Modern Information, 2020, 40(8): 27-37.)
doi: 10.3969/j.issn.1008-0821.2020.08.004
[23] 明弋洋, 刘晓洁. 基于短语级情感分析的不良信息检测方法[J]. 四川大学学报(自然科学版), 2019, 56(6): 1042-1048.
[23] Ming Yiyang, Liu Xiaojie. Sensitive Information Detection Based on Phrase-Level Sentiment Analysis[J]. Journal of Sichuan University(Natural Science Edition), 2019, 56(6): 1042-1048.)
[24] 付聪, 余敦辉, 张灵莉. 面向中文敏感词变形体的识别方法研究[J]. 计算机应用研究, 2019, 36(4): 988-991.
[24] (Fu Cong, Yu Dunhui, Zhang Lingli. Study on Identification Method for Change Form of Chinese Sensitive Words[J]. Application Research of Computers, 2019, 36(4): 988-991.)
[25] Jkiss. GitHub - jkiss/sensitive-words: 互联网常用敏感词库[DS/OL]. (2018-12-04). [2022-04-29]. https://github.com/jkiss/sensitive-words.
[26] Harris Z S. Distributional Structure[J]. WORD, 1954, 10(2-3): 146-162.
doi: 10.1080/00437956.1954.11659520
[27] 马超. 健康议题辟谣社群的类别构成与社群结构研究——基于多主体谣言协同治理的视角[J]. 情报杂志, 2019, 38(1): 96-105.
[27] (Ma Chao. Study on the Categories and Structure of Health Rumor Denials Community: From the Perspective of Rumor Cooperative Governance[J]. Journal of Intelligence, 2019, 38(1): 96-105.)
[28] 孙琛琛, 申德荣, 单菁, 等. WSR: 一种基于维基百科结构信息的语义关联度计算算法[J]. 计算机学报, 2012, 35(11): 2361-2370.
doi: 10.3724/SP.J.1016.2012.02361
[28] (Sun Chenchen, Shen Derong, Shan Jing, et al. WSR: A Semantic Relatedness Measure Based on Wikipedia Structure[J]. Chinese Journal of Computers, 2012, 35(11): 2361-2370.)
doi: 10.3724/SP.J.1016.2012.02361
[29] 孙全, 叶秀清, 顾伟康. 一种新的基于证据理论的合成公式[J]. 电子学报, 2000, 28(8): 117-119.
[29] (Sun Quan, Ye Xiuqing, Gu Weikang. A New Combination Rules of Evidence Theory[J]. Acta Electronica Sinica, 2000, 28(8): 117-119.)
[30] 陆文星, 梁昌勇, 丁勇. 一种基于证据距离的客观权重确定方法[J]. 中国管理科学, 2008, 16(6): 95-99.
[30] (Lu Wenxing, Liang Changyong, Ding Yong. A Method Determining the Objective Weights of Experts Based on Evidence Distance[J]. Chinese Journal of Management Science, 2008, 16(6): 95-99.)
[31] Jousselme A L, Grenier D, Bossé É. A New Distance Between Two Bodies of Evidence[J]. Information Fusion, 2001, 2(2): 91-101.
doi: 10.1016/S1566-2535(01)00026-4
[32] 毕文豪, 张安, 李冲. 基于新的证据冲突衡量的加权证据融合方法[J]. 控制与决策, 2016, 31(1): 73-78.
[32] (Bi Wenhao, Zhang An, Li Chong. Weighted Evidence Combination Method Based on New Evidence Conflict Measurement Approach[J]. Control and Decision, 2016, 31(1): 73-78.)
[33] 吴剑云, 胥明珠. 基于用户画像和视频兴趣标签的个性化推荐[J]. 情报科学, 2021, 39(1): 128-134.
[33] (Wu Jianyun, Xu Mingzhu. Video Personalized Recommendation Based on User Profile and Video Interest Tags[J]. Information Science, 2021, 39(1): 128-134.)
[34] 李烨, 王亚刚, 许晓鸣. 证据融合的聚焦与冲突处理研究[J]. 系统工程与电子技术, 2012, 34(6): 1113-1119.
[34] (Li Ye, Wang Yagang, Xu Xiaoming. Research on Convergence and Conflict Treatment in Evidence Fusion[J]. Systems Engineering and Electronics, 2012, 34(6): 1113-1119.)
[35] 吴宝, 池仁勇. 融入情感分析与用户热度的社交网络用户可信度量方法[J]. 系统科学与数学, 2021, 41(4): 1091-1107.
doi: 10.12341/jssms20251
[35] (Wu Bao, Chi Renyong. A Trusted Measurement Method for Social Network Users That Integrates Sentiment Analysis and User Popularity[J]. Journal of Systems Science and Mathematical Sciences, 2021, 41(4): 1091-1107.)
doi: 10.12341/jssms20251
[36] 赖茂生, 王琳, 李宇宁. 情报学前沿领域的调查与分析[J]. 图书情报工作, 2008, 52(3): 6-10.
[36] (Lai Maosheng, Wang Lin, Li Yuning. Survey and Analysis of the Frontiers in Information Science[J]. Library and Information Service, 2008, 52(3): 6-10.)
[37] Li K H, Huang Z, Cheng Y C, et al. A Maximal Figure-of-Merit Learning Approach to Maximizing Mean Average Precision with Deep Neural Network Based Classifiers[C]// Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2014: 4503-4507.
[38] 王重, 刘黎明. 拟合优度检验统计量的设定方法[J]. 统计与决策, 2010(5): 154-156.
[38] (Wang Chong, Liu Liming. Setting Method of Goodness of Fit Test Statistics[J]. Statistics & Decision, 2010(5): 154-156.)
[39] 杨宇. 多指标综合评价中赋权方法评析[J]. 统计与决策, 2006(13): 17-19.
[39] (Yang Yu. Evaluation and Analysis of Weighting Methods in Multi Index Comprehensive Evaluation[J]. Statistics & Decision, 2006(13): 17-19.)
[1] An Lu, Xu Manting. Measuring Online Trust in Government Microblogs in Public Health Emergencies[J]. 数据分析与知识发现, 2022, 6(1): 55-68.
[2] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[3] Xi Yunjiang, Du Diedie, Liao Xiao, Zhang Xuehong. Analyzing & Clustering Enterprise Microblog Users with Supernetwork[J]. 数据分析与知识发现, 2020, 4(8): 107-118.
[4] Li Tiejun,Yan Duanwu,Yang Xiongfei. Recommending Microblogs Based on Emotion-Weighted Association Rules[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[5] Liang Yanping,An Lu,Liu Jing. Topic Resonance of Micro-blogs on Similar Public Health Emergencies[J]. 数据分析与知识发现, 2020, 4(2/3): 122-133.
[6] Xu Yuemei,Liu Yunwen,Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-combined Features[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[7] Han Kangkang,Xu Jianmin,Zhang Bin. Recommending Microblogs with User’s Interests and Multidimensional Trust[J]. 数据分析与知识发现, 2020, 4(12): 95-104.
[8] Bocheng Li,Yunqiu Zhang,Kaixi Yang. Extracting Emotion Tags from Comments of Microblog Commodities[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[9] Lu An,Yanping Liang. Selection of Users’ Behaviors Towards Different Topics of Microblog on Public Health Emergencies[J]. 数据分析与知识发现, 2019, 3(4): 33-41.
[10] Gao Yongbing,Yang Guipeng,Zhang Di,Ma Zhanfei. Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words[J]. 数据分析与知识发现, 2017, 1(9): 57-64.
[11] Dun Xinhui,Zhang Yunqiu,Yang Kaixi. Fine-grained Sentiment Analysis Based on Weibo[J]. 数据分析与知识发现, 2017, 1(7): 61-72.
[12] Qi Ruihua. Identifying Chinese Microblog Author Gender Based on Dependency[J]. 数据分析与知识发现, 2017, 1(2): 58-63.
[13] Yang Shuang,Chen Fen. Analyzing Sentiments of Micro-blog Posts Based on Support Vector Machine[J]. 数据分析与知识发现, 2017, 1(2): 73-79.
[14] Yao Zhaoxu,Ma Jing. Extracting Topic and Opinion from Microblog Posts with New Algorithm[J]. 现代图书情报技术, 2016, 32(7-8): 78-86.
[15] Chen Dongyi,Zhou Zicheng,Jiang Shengyi,Wang Lianxi,Wu Jialin. A Framework for Customer Segmentation on Enterprises’ Microblog[J]. 现代图书情报技术, 2016, 32(2): 43-51.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn