Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (4): 72-82    DOI: 10.11925/infotech.2096-3467.2019.0561
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于网络贝叶斯信息准则算法的社区数量预测研究*
李文政1,顾益军1(),闫红丽2
1 中国人民公安大学信息技术与网络安全学院 北京 100038
2 中国人民公安大学国家安全与反恐怖学院 北京 100038
Predicting Community Numbers with Network Bayesian Information Criterion
Li Wenzheng1,Gu Yijun1(),Yan Hongli2
1 School of Information Technology and Cyber Security, People’s Public Security University of China, Beijing 100038, China
2 School of National Security and Counter Terrorism, People’s Public Security University of China, Beijing 100038, China
全文: PDF(1976 KB)   HTML ( 2
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对部分社区发现算法需提前指定社区数量这一问题提出社区数量预测算法。【方法】 基于贝叶斯信息准则,结合重叠社区发现算法与非重叠社区发现算法各自特征,提出网络贝叶斯信息准则算法用于社区数量预测。【结果】 将本文提出的社区数量预测方法运用到重叠社区发现与非重叠社区发现算法时,预测算法准确程度与稳定程度比Silhouette算法、模块度算法都有所提升。在重叠与非重叠两种情况下,算法准确程度相比Silhouette算法、模块度算法均提高18%以上。【局限】 社区数量预测算法只考虑网络结构,未考虑其他属性。【结论】 基于贝叶斯信息准则的社区数量预测算法可以有效实现网络中社区数量预测。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李文政
顾益军
闫红丽
关键词 复杂网络社区数量算法改进高斯分布贝叶斯信息准则    
Abstract

[Objective] This paper proposes an algorithm to predict the number of communities, aiming to improve the issues facing community detection algorithms. [Methods] First, we modified the Bayesian information criterion with characteristics of overlapping and non-overlapping community detection algorithms. Then, we constructed the Network Bayesian Information Criterion Algorithm to predict the number of communities. [Results] The accuracy and stability of the proposed algorithm were better than those of the Silhouette and Modularity algorithms. The accuracy of the former was 18% higher than those of the latter at least. [Limitations] Our new algorithm only includes the network structures. [Conclusions] The proposed algorithm based on Bayesian information criterion could effectively predict the number of network communities.

Key wordsComplex Network    Number of Communities    Algorithm Improvement    Gaussian Distribution    Bayesian Information Criterion
收稿日期: 2019-05-26     
中图分类号:  TP391  
基金资助:*本文系国家重点研发计划项目(2017YFC0820100);中国人民公安大学校内基科费项目“全球治理观视野下打击防范‘东突’问题对策研究”的研究成果之一(2018JKF212)
通讯作者: 顾益军     E-mail: guyijun@ppsuc.edu.cn
引用本文:   
李文政,顾益军,闫红丽. 基于网络贝叶斯信息准则算法的社区数量预测研究*[J]. 数据分析与知识发现, 2020, 4(4): 72-82.
Li Wenzheng,Gu Yijun,Yan Hongli. Predicting Community Numbers with Network Bayesian Information Criterion. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2019.0561.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.0561
图1  网络中节点间距离示意图
图2  Elbow Method示意图
图3  基于Elbow Method的NBIC
数据集 节点数 边数 真实社区数 预估范围
348 224 6 384 9 15
414 150 3 386 7 13
3980 52 292 6 8
428333 65 1 242 3 9
1367531 239 8 142 4 16
1548841 179 3 155 4 14
2029971 143 2 830 6 12
2363991 214 4 462 9 15
7424642 34 661 4 5
7670202 98 1 978 3 9
7888452 120 1 244 8 10
9298152 182 2 398 6 13
9524062 143 2 856 6 11
12589521 130 1 676 5 14
12771872 132 1 676 5 12
14060856 188 2 642 8 14
14120253 143 1 038 4 12
134943586 247 804 9 16
表1  数据集情况
图4  非重叠社区发现算法的NBIC算法示意图
图5  非重叠社区发现算法的Silhouette算法示意图
图6  非重叠社区发现算法的模块度算法示意图
数据集 NBIC(J) Silhouette(J) Modularity(J) NBIC(MRR) Silhouette(MRR) Modularity(MRR)
348 92.31% 38.46% 38.46% 0.50 0.11 0.11
414 90.91% 54.55% 63.64% 0.50 0.17 0.20
3980 66.67% 33.33% 83.33% 0.33 0.20 0.50
428333 71.43% 71.43% 100.00% 0.33 0.33 1.00
1367531 85.72% 85.72% 92.86% 0.33 0.33 0.20
1548841 100.00% 90.91% 66.67% 1.00 0.48 0.20
2029971 90.91% 54.55% 40.00% 0.52 0.18 0.14
2363991 85.61% 60.00% 69.23% 0.35 0.16 0.20
7424642 100.00% 100.00% 75.00% 1.00 1.00 0.50
7670202 42.86% 85.71% 87.50% 0.20 0.50 0.50
7888452 100.00% 37.50% 88.89% 1.00 0.17 0.50
9298152 81.81% 81.81% 75.00% 0.33 0.33 0.25
9524062 77.77% 55.56% 40.00% 0.33 0.20 0.14
12589521 91.67% 75.00% 100.00% 0.50 0.25 1.00
12771872 100.00% 75.00% 90.00% 1.00 0.29 0.50
14060856 91.67% 41.67% 41.67% 0.50 0.13 0.13
14120253 20.00% 80.00% 50.00% 0.11 0.33 0.17
134943586 78.57% 57.14% 35.00% 0.25 0.14 0.10
评分值 81.55% 65.45% 68.74% 9.08 5.30 6.34
表2  非重叠社区发现算法的社区数量预测实验结果
图7  面向非重叠社区发现算法的社区数量预测结果
数据集 NBIC(J) Silhouette(J) Modularity(J) NBIC(MRR) Silhouette(MRR) Modularity(MRR)
348 84.62% 30.77% 15.38% 0.33 0.10 0.83
414 72.72% 63.64% 72.73% 0.25 0.20 0.25
3980 83.33% 33.30% 33.33% 0.50 0.20 0.20
428333 71.42% 71.42% 14.29% 0.17 0.33 0.14
1367531 78.57% 85.71% 28.57% 0.25 0.33 0.09
1548841 66.67% 83.33% 8.33% 0.20 0.33 0.83
2029971 60.00% 90.00% 60.00% 0.20 0.50 0.20
2363991 100.00% 46.15% 76.92% 1.00 0.12 0.25
7424642 75.00% 25.00% 50.00% 0.50 0.25 0.33
7670202 37.50% 87.50% 87.50% 0.17 0.50 0.50
7888452 100.00% 80.00% 44.44% 1.00 0.33 0.17
9298152 66.67% 41.67% 50.00% 0.20 0.13 0.14
9524062 72.72% 72.72% 0.00% 0.25 0.25 0.09
12589521 58.33% 58.33% 33.33% 0.17 0.17 0.11
12771872 100.00% 20.00% 40.00% 1.00 0.11 0.14
14060856 58.33% 75.00% 25.00% 0.17 0.25 0.10
14120253 40.00% 80.00% 10.00% 0.14 0.33 0.10
134943586 100.00% 71.43% 100.00% 1.00 0.20 1.00
评分值 73.66% 62.00% 41.66% 7.50 4.63 5.48
表3  重叠社区发现算法的社区数量预测实验结果
图8  面向重叠社区发现算法的社区数量预测结果
方差分析 NBIC Silhouette Modularity
非重叠社区 0.044 3 0.040 6 0.052 1
重叠社区 0.036 6 0.054 0 0.082 8
平均值 0.040 5 0.047 3 0.067 5
表4  方差分析
[1] 付顺顺, 顾益军, 张大瀚 , 等. 基于改进MCMC方法的重叠社区发现算法[J]. 信息网络安全, 2017(9):138-142.
( Fu Shunshun, Gu Yijun, Zhang Dahan , et al. Overlapping Community Detection Algorithm Based on Improved MCMC Method[J]. Netinfo Security, 2017(9):138-142.)
[2] 占文威, 席景科, 王志晓 . 基于相似性模块度的层次聚合社区发现算法[J]. 系统仿真学报, 2017,29(5):1028-1032.
( Zhan Wenwei, Xi Jingke, Wang Zhixiao . Hierarchical Agglomerative Community Detection Algorithm Based on Similarity Modularity[J]. Journal of System Simulation, 2017,29(5):1028-1032.)
[3] Girvan M, Newman M E J. Community Structure in Social and Biological Networks[J]. Proceedings of the National Academy of Sciences, 2002,99(12):7821-7826.
doi: 10.1073/pnas.122653799
[4] 王宇欢 . 在线社会网络中基于属性的重叠社区发现算法研究与应用[D]. 沈阳: 东北大学, 2014.
( Wang Yuhuan . Study on the Overlapping Community Detection Algorithm in Online Social Network and Application[D]. Shenyang: Northeastern University, 2014.)
[5] Newman M E J . Finding and Evaluating Community Structure in Networks[J]. Physical Review E, 2004,69(2):026113.
doi: 10.1103/PhysRevE.69.026113
[6] Newman M E J. C Community Detection and Graph Partitioning[J]. EPL (Europhysics Letters), 2013, 103(2): Article No. 28003.
[7] Ma T, Xia Z . An Improved Label Propagation Algorithm Based on Node Importance and Random Walk for Community Detection[J]. Modern Physics Letters B, 2017,31(14):1750162.
doi: 10.1142/S0217984917501627
[8] De Meo P, Ferrara E, Fiumara G, et al. Generalized Louvain Method for Community Detection in Large Networks[C]// Proceedings of the 11th International Conference on Intelligent Systems Design and Applications. IEEE, 2011: 88-93.
[9] McAuley J, Leskovec J. Learning to Discover Social Circles in Ego Networks[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012: 539-547.
[10] Narantsatsralt U U, Kang S. Social Network Community Detection Using Agglomerative Spectral Clustering[J]. Complexity, 2017: Article ID 3719428.
[11] Maulik U, Bandyopadhyay S . Performance Evaluation of Some Clustering Algorithms and Validity Indices[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(12):1650-1654.
doi: 10.1109/TPAMI.2002.1114856
[12] Newman M E J . Fast Algorithm for Detecting Community Structure in Networks[J]. Physical Review E, 2004,69(6):066133.
doi: 10.1103/PhysRevE.69.066133
[13] 桂春 . 复杂网络中的社团发现与链路预测[D]. 兰州: 兰州大学, 2018.
( Gui Chun . Community Detection and Link Prediction of Complex Networks[D]. Lanzhou: Lanzhou University, 2018.)
[14] Rousseeuw P J . Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis[J]. Journal of Computational and Applied Mathematics, 1987,20:53-65.
doi: 10.1016/0377-0427(87)90125-7
[15] Arnaboldi V, Passarella A, Conti M , et al. Online Social Networks: Human Cognitive Constraints in Facebook and Twitter Personal Graphs[M]. Elsevier, 2015: 59-62.
[16] 杜文举, 张建刚, 俞建宁 . 复杂公交社团网络模型的稳定性研究[J]. 计算机工程与应用, 2017,53(23):39-46.
( Du Wenju, Zhang Jian’gang, Yu Jianning . Research on Stability of Complex Bus Community Network Model[J]. Computer Engineering and Applications, 2017,53(23):39-46.)
[17] 蔡彪, 庹先国, 桑强 , 等. 复杂网络中基于三角环吸引子的社区检测[J]. 计算机工程, 2016,42(9):197-201.
doi: 10.3969/j.issn.1000-3428.2016.09.035
( Cai Biao, Tuo Xianguo, Sang Qiang , et al. Community Detection in Complex Network Based on Triangle Clique Attractors[J]. Computer Engineering, 2016,42(9):197-201.)
doi: 10.3969/j.issn.1000-3428.2016.09.035
[18] Pelleg D, Moore A W. X-means: Extending K-means with Efficient Estimation of the Number of Clusters[C]// Proceedings of the 17th International Conference on Machine Learning. 2000,1:727-734.
[19] 储岳中 . 一类基于贝叶斯信息准则的k均值聚类算法[J]. 安徽工业大学学报: 自然科学版, 2010,27(4):409-412.
( Chu Yuezhong . An k Means Clustering Algorithm Based on Bayesian Information Criterion[J]. Journal of Anhui University of Technology: Natural Science, 2010,27(4):409-412.)
[20] Chakrabarti S, Khanna R. Structured Learning for Non-Smooth Ranking Losses[C]// Proceedings of SIGKDD Conference, ACM, 2008: 88-96.
[21] Kass R E, Wasserman L . A Reference Bayesian Test for Nested Hypotheses and Its Relationship to the Schwarz Criterion[J]. Publications of the American Statistical Association, 1995,90(431):928-934.
[22] 侯作勋, 韩培, 张宏伟 , 等. 一种硬件友好型自适应K-均值学习算法[J]. 航天返回与遥感, 2017,38(3):68-77.
( Hou Zuoxun, Han Pei, Zhang Hongwei , et al. An Hardware-friendly Adaptive K-means Learning Algorithm[J]. Spacecraft Recovery & Remote Sensing, 2017,38(3):68-77.)
[23] 于剑, 程乾生 . 模糊聚类方法中的最佳聚类数的搜索范围[J]. 中国科学E辑:技术科学, 2002,32(2):274-280.
( Yu Jian, Cheng Qiansheng . Search Range of Optimal Cluster Number in Fuzzy Clustering Method[J]. Scientia Sinica:Technologica, 2002,32(2):274-280.)
[24] Syakur M A, Khotimah B K, Rochman E M S, et al. Integration K-Means Clustering Method and Elbow Method for Identification of the Best Customer Profile Cluster[J]. IOP Conference Series: Materials Science and Engineering, 2018,336(1):012017.
doi: 10.1088/1757-899X/336/1/012017
[25] 吴广建, 章剑林, 袁丁 . 基于K-means的手肘法自动获取K值方法研究[J]. 软件, 2019,40(5):167-170.
( Wu Guangjian, Zhang Jianlin, Yuan Ding . Automatically Obtaining K Value Based on K-means Elbow Method[J]. Computer Engineering & Software, 2019,40(5):167-170.)
[26] Stanford Network Analysis Project. Stanford Large Network Dataset Collection[EB/OL]. [ 2019- 01- 15]. http://snap.stanford.edu/data/index.html.
[1] 关鹏,王曰芬. 国内外专利网络研究进展*[J]. 数据分析与知识发现, 2020, 4(1): 26-39.
[2] 李想,钱晓东. 商品在线评价对消费趋同影响研究*[J]. 数据分析与知识发现, 2019, 3(3): 102-111.
[3] 严娇,马静,房康. 基于融合共现距离的句法网络下文本语义相似度计算 *[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
[4] 蒋武轩,熊回香,叶佳鑫,安宁. 网络社交平台中社群标签动态生成研究 *[J]. 数据分析与知识发现, 2019, 3(10): 98-109.
[5] 钱晓东,李敏. 基于复杂网络重叠社区的电子商务用户复合类型识别*[J]. 数据分析与知识发现, 2018, 2(6): 79-91.
[6] 陈云伟,张瑞红. 用于情报挖掘的典型网络社团划分算法比较研究*[J]. 数据分析与知识发现, 2018, 2(10): 84-94.
[7] 刘冰瑶,马静,李晓峰. 一种“特征降维”文本复杂网络的话题表示模型*[J]. 数据分析与知识发现, 2017, 1(11): 53-61.
[8] 吴江,陈君,张劲帆. 协同创新中知识供需系统的模拟研究*[J]. 现代图书情报技术, 2016, 32(9): 27-33.
[9] 叶腾,韩丽川,邢春晓,张妍. 基于复杂网络的虚拟社区创新知识传播机制研究*[J]. 现代图书情报技术, 2016, 32(7-8): 70-77.
[10] 夏立新,谭荧. LOD的网络结构分析与可视化*[J]. 现代图书情报技术, 2016, 32(1): 65-72.
[11] 王小立. 智能多Agent网络的微信信息传播仿真研究[J]. 现代图书情报技术, 2015, 31(6): 85-92.
[12] 杨宁, 黄飞虎, 文奕, 陈云伟. 基于微博用户行为的观点传播模型[J]. 现代图书情报技术, 2015, 31(12): 34-41.
[13] 杜坤, 刘怀亮, 郭路杰. 结合复杂网络的特征权重改进算法研究[J]. 现代图书情报技术, 2015, 31(11): 26-32.
[14] 朱侯. 考虑信任与权威影响的社会网络-舆论协同演化的研究[J]. 现代图书情报技术, 2015, 31(10): 50-57.
[15] 何玉梅, 齐佳音, 刘慧丽. 微博局部世界演化模型探究*[J]. 现代图书情报技术, 2014, 30(5): 66-73.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn