Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (4): 72-82    DOI: 10.11925/infotech.2096-3467.2019.0561
Current Issue | Archive | Adv Search |
Predicting Community Numbers with Network Bayesian Information Criterion
Li Wenzheng1,Gu Yijun1(),Yan Hongli2
1 School of Information Technology and Cyber Security, People’s Public Security University of China, Beijing 100038, China
2 School of National Security and Counter Terrorism, People’s Public Security University of China, Beijing 100038, China
Download: PDF(1976 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes an algorithm to predict the number of communities, aiming to improve the issues facing community detection algorithms. [Methods] First, we modified the Bayesian information criterion with characteristics of overlapping and non-overlapping community detection algorithms. Then, we constructed the Network Bayesian Information Criterion Algorithm to predict the number of communities. [Results] The accuracy and stability of the proposed algorithm were better than those of the Silhouette and Modularity algorithms. The accuracy of the former was 18% higher than those of the latter at least. [Limitations] Our new algorithm only includes the network structures. [Conclusions] The proposed algorithm based on Bayesian information criterion could effectively predict the number of network communities.

Key wordsComplex Network      Number of Communities      Algorithm Improvement      Gaussian Distribution      Bayesian Information Criterion     
Received: 26 May 2019      Published: 01 June 2020
ZTFLH:  TP391  
Corresponding Authors: Gu Yijun     E-mail: guyijun@ppsuc.edu.cn

Cite this article:

Li Wenzheng,Gu Yijun,Yan Hongli. Predicting Community Numbers with Network Bayesian Information Criterion. Data Analysis and Knowledge Discovery, 2020, 4(4): 72-82.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0561     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I4/72

Distance Between Nodes in the Network
Schematic Diagram of Elbow Method
NBIC Value Based on Elbow Method
数据集 节点数 边数 真实社区数 预估范围
348 224 6 384 9 15
414 150 3 386 7 13
3980 52 292 6 8
428333 65 1 242 3 9
1367531 239 8 142 4 16
1548841 179 3 155 4 14
2029971 143 2 830 6 12
2363991 214 4 462 9 15
7424642 34 661 4 5
7670202 98 1 978 3 9
7888452 120 1 244 8 10
9298152 182 2 398 6 13
9524062 143 2 856 6 11
12589521 130 1 676 5 14
12771872 132 1 676 5 12
14060856 188 2 642 8 14
14120253 143 1 038 4 12
134943586 247 804 9 16
Introduction to the Data Set
NBIC Algorithm for Non-Overlapping Community Detection Algorithm
Silhouette Algorithm for Non-Overlapping Community Detection Algorithm
Modularity Algorithm for Non-Overlapping Community Detection Algorithm
数据集 NBIC(J) Silhouette(J) Modularity(J) NBIC(MRR) Silhouette(MRR) Modularity(MRR)
348 92.31% 38.46% 38.46% 0.50 0.11 0.11
414 90.91% 54.55% 63.64% 0.50 0.17 0.20
3980 66.67% 33.33% 83.33% 0.33 0.20 0.50
428333 71.43% 71.43% 100.00% 0.33 0.33 1.00
1367531 85.72% 85.72% 92.86% 0.33 0.33 0.20
1548841 100.00% 90.91% 66.67% 1.00 0.48 0.20
2029971 90.91% 54.55% 40.00% 0.52 0.18 0.14
2363991 85.61% 60.00% 69.23% 0.35 0.16 0.20
7424642 100.00% 100.00% 75.00% 1.00 1.00 0.50
7670202 42.86% 85.71% 87.50% 0.20 0.50 0.50
7888452 100.00% 37.50% 88.89% 1.00 0.17 0.50
9298152 81.81% 81.81% 75.00% 0.33 0.33 0.25
9524062 77.77% 55.56% 40.00% 0.33 0.20 0.14
12589521 91.67% 75.00% 100.00% 0.50 0.25 1.00
12771872 100.00% 75.00% 90.00% 1.00 0.29 0.50
14060856 91.67% 41.67% 41.67% 0.50 0.13 0.13
14120253 20.00% 80.00% 50.00% 0.11 0.33 0.17
134943586 78.57% 57.14% 35.00% 0.25 0.14 0.10
评分值 81.55% 65.45% 68.74% 9.08 5.30 6.34
Community Number Prediction of Non-Overlapping Community Detection Algorithm
The Prediction of the Number of Communities for Non-Overlapping Community Discovery Algorithm
数据集 NBIC(J) Silhouette(J) Modularity(J) NBIC(MRR) Silhouette(MRR) Modularity(MRR)
348 84.62% 30.77% 15.38% 0.33 0.10 0.83
414 72.72% 63.64% 72.73% 0.25 0.20 0.25
3980 83.33% 33.30% 33.33% 0.50 0.20 0.20
428333 71.42% 71.42% 14.29% 0.17 0.33 0.14
1367531 78.57% 85.71% 28.57% 0.25 0.33 0.09
1548841 66.67% 83.33% 8.33% 0.20 0.33 0.83
2029971 60.00% 90.00% 60.00% 0.20 0.50 0.20
2363991 100.00% 46.15% 76.92% 1.00 0.12 0.25
7424642 75.00% 25.00% 50.00% 0.50 0.25 0.33
7670202 37.50% 87.50% 87.50% 0.17 0.50 0.50
7888452 100.00% 80.00% 44.44% 1.00 0.33 0.17
9298152 66.67% 41.67% 50.00% 0.20 0.13 0.14
9524062 72.72% 72.72% 0.00% 0.25 0.25 0.09
12589521 58.33% 58.33% 33.33% 0.17 0.17 0.11
12771872 100.00% 20.00% 40.00% 1.00 0.11 0.14
14060856 58.33% 75.00% 25.00% 0.17 0.25 0.10
14120253 40.00% 80.00% 10.00% 0.14 0.33 0.10
134943586 100.00% 71.43% 100.00% 1.00 0.20 1.00
评分值 73.66% 62.00% 41.66% 7.50 4.63 5.48
Community Number Prediction of Overlapping Community Detection Algorithm
The Prediction of the Number of Communities for Overlapping Community Discovery Algorithm
方差分析 NBIC Silhouette Modularity
非重叠社区 0.044 3 0.040 6 0.052 1
重叠社区 0.036 6 0.054 0 0.082 8
平均值 0.040 5 0.047 3 0.067 5
Analysis of Variance
[1] 付顺顺, 顾益军, 张大瀚 , 等. 基于改进MCMC方法的重叠社区发现算法[J]. 信息网络安全, 2017(9):138-142.
[1] ( Fu Shunshun, Gu Yijun, Zhang Dahan , et al. Overlapping Community Detection Algorithm Based on Improved MCMC Method[J]. Netinfo Security, 2017(9):138-142.)
[2] 占文威, 席景科, 王志晓 . 基于相似性模块度的层次聚合社区发现算法[J]. 系统仿真学报, 2017,29(5):1028-1032.
[2] ( Zhan Wenwei, Xi Jingke, Wang Zhixiao . Hierarchical Agglomerative Community Detection Algorithm Based on Similarity Modularity[J]. Journal of System Simulation, 2017,29(5):1028-1032.)
[3] Girvan M, Newman M E J. Community Structure in Social and Biological Networks[J]. Proceedings of the National Academy of Sciences, 2002,99(12):7821-7826.
doi: 10.1073/pnas.122653799
[4] 王宇欢 . 在线社会网络中基于属性的重叠社区发现算法研究与应用[D]. 沈阳: 东北大学, 2014.
[4] ( Wang Yuhuan . Study on the Overlapping Community Detection Algorithm in Online Social Network and Application[D]. Shenyang: Northeastern University, 2014.)
[5] Newman M E J . Finding and Evaluating Community Structure in Networks[J]. Physical Review E, 2004,69(2):026113.
doi: 10.1103/PhysRevE.69.026113
[6] Newman M E J. C Community Detection and Graph Partitioning[J]. EPL (Europhysics Letters), 2013, 103(2): Article No. 28003.
[7] Ma T, Xia Z . An Improved Label Propagation Algorithm Based on Node Importance and Random Walk for Community Detection[J]. Modern Physics Letters B, 2017,31(14):1750162.
doi: 10.1142/S0217984917501627
[8] De Meo P, Ferrara E, Fiumara G, et al. Generalized Louvain Method for Community Detection in Large Networks[C]// Proceedings of the 11th International Conference on Intelligent Systems Design and Applications. IEEE, 2011: 88-93.
[9] McAuley J, Leskovec J. Learning to Discover Social Circles in Ego Networks[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012: 539-547.
[10] Narantsatsralt U U, Kang S. Social Network Community Detection Using Agglomerative Spectral Clustering[J]. Complexity, 2017: Article ID 3719428.
[11] Maulik U, Bandyopadhyay S . Performance Evaluation of Some Clustering Algorithms and Validity Indices[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(12):1650-1654.
doi: 10.1109/TPAMI.2002.1114856
[12] Newman M E J . Fast Algorithm for Detecting Community Structure in Networks[J]. Physical Review E, 2004,69(6):066133.
doi: 10.1103/PhysRevE.69.066133
[13] 桂春 . 复杂网络中的社团发现与链路预测[D]. 兰州: 兰州大学, 2018.
[13] ( Gui Chun . Community Detection and Link Prediction of Complex Networks[D]. Lanzhou: Lanzhou University, 2018.)
[14] Rousseeuw P J . Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis[J]. Journal of Computational and Applied Mathematics, 1987,20:53-65.
doi: 10.1016/0377-0427(87)90125-7
[15] Arnaboldi V, Passarella A, Conti M , et al. Online Social Networks: Human Cognitive Constraints in Facebook and Twitter Personal Graphs[M]. Elsevier, 2015: 59-62.
[16] 杜文举, 张建刚, 俞建宁 . 复杂公交社团网络模型的稳定性研究[J]. 计算机工程与应用, 2017,53(23):39-46.
[16] ( Du Wenju, Zhang Jian’gang, Yu Jianning . Research on Stability of Complex Bus Community Network Model[J]. Computer Engineering and Applications, 2017,53(23):39-46.)
[17] 蔡彪, 庹先国, 桑强 , 等. 复杂网络中基于三角环吸引子的社区检测[J]. 计算机工程, 2016,42(9):197-201.
doi: 10.3969/j.issn.1000-3428.2016.09.035
[17] ( Cai Biao, Tuo Xianguo, Sang Qiang , et al. Community Detection in Complex Network Based on Triangle Clique Attractors[J]. Computer Engineering, 2016,42(9):197-201.)
doi: 10.3969/j.issn.1000-3428.2016.09.035
[18] Pelleg D, Moore A W. X-means: Extending K-means with Efficient Estimation of the Number of Clusters[C]// Proceedings of the 17th International Conference on Machine Learning. 2000,1:727-734.
[19] 储岳中 . 一类基于贝叶斯信息准则的k均值聚类算法[J]. 安徽工业大学学报: 自然科学版, 2010,27(4):409-412.
[19] ( Chu Yuezhong . An k Means Clustering Algorithm Based on Bayesian Information Criterion[J]. Journal of Anhui University of Technology: Natural Science, 2010,27(4):409-412.)
[20] Chakrabarti S, Khanna R. Structured Learning for Non-Smooth Ranking Losses[C]// Proceedings of SIGKDD Conference, ACM, 2008: 88-96.
[21] Kass R E, Wasserman L . A Reference Bayesian Test for Nested Hypotheses and Its Relationship to the Schwarz Criterion[J]. Publications of the American Statistical Association, 1995,90(431):928-934.
[22] 侯作勋, 韩培, 张宏伟 , 等. 一种硬件友好型自适应K-均值学习算法[J]. 航天返回与遥感, 2017,38(3):68-77.
[22] ( Hou Zuoxun, Han Pei, Zhang Hongwei , et al. An Hardware-friendly Adaptive K-means Learning Algorithm[J]. Spacecraft Recovery & Remote Sensing, 2017,38(3):68-77.)
[23] 于剑, 程乾生 . 模糊聚类方法中的最佳聚类数的搜索范围[J]. 中国科学E辑:技术科学, 2002,32(2):274-280.
[23] ( Yu Jian, Cheng Qiansheng . Search Range of Optimal Cluster Number in Fuzzy Clustering Method[J]. Scientia Sinica:Technologica, 2002,32(2):274-280.)
[24] Syakur M A, Khotimah B K, Rochman E M S, et al. Integration K-Means Clustering Method and Elbow Method for Identification of the Best Customer Profile Cluster[J]. IOP Conference Series: Materials Science and Engineering, 2018,336(1):012017.
doi: 10.1088/1757-899X/336/1/012017
[25] 吴广建, 章剑林, 袁丁 . 基于K-means的手肘法自动获取K值方法研究[J]. 软件, 2019,40(5):167-170.
[25] ( Wu Guangjian, Zhang Jianlin, Yuan Ding . Automatically Obtaining K Value Based on K-means Elbow Method[J]. Computer Engineering & Software, 2019,40(5):167-170.)
[26] Stanford Network Analysis Project. Stanford Large Network Dataset Collection[EB/OL]. [ 2019- 01- 15]. http://snap.stanford.edu/data/index.html.
[1] Xiang Li,Xiaodong Qian. Research on Impact of Commodity Online Evaluation for Consumption Convergence[J]. 数据分析与知识发现, 2019, 3(3): 102-111.
[2] Jiao Yan,Jing Ma,Kang Fang. Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
[3] Wuxuan Jiang,Huixiang Xiong,Jiaxin Ye,Ning An. Creating Dynamic Tags for Social Networking Groups[J]. 数据分析与知识发现, 2019, 3(10): 98-109.
[4] Xiaodong Qian,Min Li. Identifying E-commerce User Types Based on Complex Network Overlapping Community[J]. 数据分析与知识发现, 2018, 2(6): 79-91.
[5] Yunwei Chen,Ruihong Zhang. Comparing on Community Detection Algorithms for Information Mining[J]. 数据分析与知识发现, 2018, 2(10): 84-94.
[6] Bingyao Liu,Jing Ma,Xiaofeng Li. Topic Representation Model Based on “Feature Dimensionality Reduction”[J]. 数据分析与知识发现, 2017, 1(11): 53-61.
[7] Wu Jiang,Chen Jun,Zhang Jinfan. A Knowledge Supply-Demand Simulation System for Collaborative Innovation[J]. 现代图书情报技术, 2016, 32(9): 27-33.
[8] Ye Teng,Han Lichuan,Xing Chunxiao,Zhang Yan. Knowledge Dissemination Mechanism in Virtual Communities: Case Study Based on Complex Network Theory[J]. 现代图书情报技术, 2016, 32(7-8): 70-77.
[9] Lixin Xia,Ying Tan. Analysis and Visualization of the LOD Network Structure[J]. 现代图书情报技术, 2016, 32(1): 65-72.
[10] Yang Ning, Huang Feihu, Wen Yi, Chen Yunwei. An Opinion Evolution Model Based on the Behavior of Micro-blog Users[J]. 现代图书情报技术, 2015, 31(12): 34-41.
[11] Du Kun, Liu Huailiang, Guo Lujie. Study on the Modified Method of Feature Weighting with Complex Networks[J]. 现代图书情报技术, 2015, 31(11): 26-32.
[12] Zhu Hou. Co-evolution of Social Networks and Public Opinion Considering the Effect of Trust and Authority[J]. 现代图书情报技术, 2015, 31(10): 50-57.
[13] He Yumei, Qi Jiayin, Liu Huili. The Study of Local-world Network Evolution Model Based on Microblog[J]. 现代图书情报技术, 2014, 30(5): 66-73.
[14] Tang Xiaobo, Xiao Lu. Research of Text Feature Extraction on Dependency Parsing Network[J]. 现代图书情报技术, 2014, 30(11): 31-37.
[15] Yang Zhimo, Liu Huailiang, Zhao Hui. An Algorithm of Chinese Text Representation Based on Complex Network[J]. 现代图书情报技术, 2014, 30(11): 38-44.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn