Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (5): 41-50    DOI: 10.11925/infotech.2096-3467.2020.1208
Current Issue | Archive | Adv Search |
Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation
Chen Wenjie(),Wen Yi,Yang Ning
Chengdu Library and Information Center, Chinese Academy of Sciences, Chengdu 610041, China
Download: PDF (1290 KB)   HTML ( 9
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposed a fuzzy community partition algorithm based on node vector representation,aiming to solve the problems of poor efficiency and accuracy of existing fuzzy overlapping community partition algorithms. [Methods] Firstly, the random walk strategy guided by node importance is used to generate the walk sequence, and then the skip-gram model is used to train the node vector. Then, the Gaussian mixture model is introduced into the community partition to realize the multi peak node data fitting. Finally, the optimal number of communities is obtained by maximizing the modularity. [Results] Compared with the classical community detection method, the EQ values of the algorithm on the real network jazz and artificial network N1 (mu = 0.5) are increased by 7.0% and 9.7% respectively, which can more accurately detect the community structure in the network. [Limitations] In the vector representation learning, only the topological structure information of complex network is considered, while the node attribute information and edge label information are ignored. [Conclusions] The fuzzy overlapping community detection algorithm based on node vector representation can effectively complete the community division task of complex network.

Key wordsComplex Network      Community Structure      Representation Learning      Random Walk     
Received: 03 December 2020      Published: 27 May 2021
ZTFLH:  TP393  
Fund:The work is supported by the 13th Five-year Informatization Plan of Chinese Academy of Sciences(XXH13506)
Corresponding Authors: Chen Wenjie     E-mail: chenwj@clas.ac.cn

Cite this article:

Chen Wenjie,Wen Yi,Yang Ning. Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation. Data Analysis and Knowledge Discovery, 2021, 5(5): 41-50.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.1208     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I5/41

Flow Chart of FONV Algorithm
Skip-gram Model(c=2)
网络名 节点数 边数
Football 115 613
Jazz 198 5 483
Netscience 1 461 2 742
Real Network
参数 N1 N2
节点数目 500 5 000
节点平均度 30 200
节点最大度 60 500
社区最小节点数目 50 300
社区最大节点数目 100 1 000
混合参数mu 0~1 0~1
Artificial Network
Visualization of Node Vector
算法 Football Jazz Netscience N1 N2
NiWalk2Vec 1.71s 3.25s 34.14s 9.31s 5.96min
NMF 0.56s 1.01s 101.35s 2.76s 16.51min
Spectral Mapping 0.06s 0.09s 86.92s 0.58s 32.82min
Comparison of Average Running Time
算法 Football Jazz Netscience
GCE 59.5% 28.2% 54.7%
COPRA 60.3% 41.4% 74.9%
BMLPA 60.1% 20.6% 71.4%
CPM 35.7% 9.4% 75.4%
G-FCM 60.4% 44.3% 79.2%
Comparison of EQ Value
EQ and NMI on Artificial Network
算法 EQ NMI 运行时间
NiWalk2Vec 35.1% 100% 9.31s
DeepWalk 33.4% 98.2% 7.46s
Comparison Between NiWalk2Vec and DeepWalk
向量维度 EQ NMI
5 28.9% 77.1%
10 32.8% 96.5%
20 33.0% 99.3%
50 35.1% 100%
100 30.7% 94.7%
EQ and NMI Vary with Vector Dimensions
[1] 赵卫绩, 张凤斌, 刘井莲. 复杂网络社区发现研究进展[J]. 计算机科学, 2020,47(2):10-20.
[1] ( Zhao Weiji, Zhang Fengbin, Liu Jinglian. Review on Community Detection in Complex Networks[J]. Computer Science, 2020,47(2):10-20.)
[2] 肖婧, 张永建, 许小可. 复杂网络模糊重叠社区检测研究进展[J]. 复杂系统与复杂性科学, 2017,14(3):8-29.
[2] ( Xiao Jing, Zhang Yongjian, Xu Xiaoke. Research Progress of Fuzzy Overlapping Community Detection in Complex Networks[J]. Complex Systems and Complexity Science, 2017,14(3):8-29.)
[3] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of Advances in Neural Information Processing Systems. 2013: 3111-3119.
[4] Perozzi B, Alrfou R, Skiena S. DeepWalk: Online Learning of Social Representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Detection and Data Mining. 2014: 701-710.
[5] Bezdek J C, Ehrlich R, Full W. FCM: The Fuzzy c-Means Clustering Algorithm[J]. Computers & Geoences, 1984,10(2/3):191-203.
[6] Yu S X, Shi J. Multiclass Spectral Clustering[C]// Proceedings of the 9th IEEE International Conference on Computer Vision. IEEE Computer Society, 2003.
[7] Zhang S, Wang R S, Zhang X S. Identification of Overlapping Community Structure in Complex Networks Using Fuzzy c-means Clustering[J]. Physica A: Statistical Mechanics and Its Applications, 2007,374(1):483-490.
doi: 10.1016/j.physa.2006.07.023
[8] Wang W J, Liu D, Liu X, et al. Fuzzy Overlapping Community Detection Based on Local Random Walk and Multidimensional Scaling[J]. Physica A: Statistical Mechanics and Its Applications, 2013,392(24):6578-6586.
doi: 10.1016/j.physa.2013.08.028
[9] 孙延维, 彭智明, 李健波. 基于粒子群优化与模糊聚类的社区发现算法[J]. 重庆邮电大学学报(自然科学版), 2015,27(5):660-666.
[9] ( Sun Weiyan, Peng Zhiming, Li Jianbo. Community Detection Algorithm Based on Particle Swarm Optimization and Fuzzy Clustering[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2015,27(05):660-666.)
[10] 付饶, 孟凡荣, 邢艳. 基于节点重要性与相似性的重叠社区发现算法[J]. 计算机工程, 2018,44(9):192-198.
[10] ( Fu Rao, Meng Fanrong, Xing Yan. Overlapping Community Discovery Algorithm Based on Node Importance and Similarity[J]. Computer Engineering, 2018,44(9):192-198.)
[11] 汪晓锋, 刘功申, 李建华. 基于模糊聚类的多分辨率社区发现方法[J]. 电子与信息学报, 2017,39(9):2033-2039.
[11] ( Wang Xiaofeng, Liu Gongshen, Li Jianhua. Multiresolution Community Detection Based on Fuzzy Clustering[J]. Journal of Electronics & Information Technology, 2017,39(9):2033-2039.)
[12] 段忠祥. 复杂网络模糊重叠社区结构自动检测仿真[J]. 计算机仿真, 2020,37(6):352-356.
[12] ( Duan Zhongxiang. Automatic Detection and Simulation of Complex Network Fuzzy Overlapping Community Structure[J]. Computer Simulation, 2020,37(6):352-356.)
[13] Lee D D, Seung H S. Learning the Parts of Objects by Non-Negative Matrix Factorization[J]. Nature, 1999,401:788-791.
pmid: 10548103
[14] Zhang S H, Wang R S, Zhang X S. Uncovering Fuzzy Community Structure in Complex Networks[J]. Physical Review E Statal Nonlinear & Soft Matter Physics, 2007,76(4):046103.
[15] Psorakis I, Roberts S, Ebden M, et al. Overlapping Community Detection Using Bayesian Non-Negative Matrix Factorization[J]. Physical Review E Statal Nonlinear & Soft Matter Physics, 2011,83(6):066114.
[16] 李玉翔, 李弼程, 郭志刚. 基于非负矩阵分解的网络重叠社区发现研究[J]. 系统仿真学报, 2014,26(3):643-649.
[16] ( Li Yuxiang, Li Bicheng, Guo Zhigang. Research on Overlapping Community Detection in Networks Using Non-negative Matrix Factorization[J]. Journal of System Simulation, 2014,26(3):643-649.)
[17] Ye F H, Chen C, Zheng Z B, et al. Discrete Overlapping Community Detection with Pseudo Supervision[C]// Proceedings of 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 2020.
[18] Gregory S. Finding Overlapping Communities in Networks by Label Propagation[J]. New Journal of Physics, 2009,12(10):2011-2024.
[19] 刘世超, 朱福喜, 甘琳. 基于标签传播概率的重叠社区发现算法[J]. 计算机学报, 2016,39(4):717-729.
[19] ( Liu Shichao, Zhu Fuxi, Gan Lin. A Label-Propagation-Probability-Based Algorithm for Overlapping Community Detection[J]. Chinese Journal of Computers, 2016,39(4):717-729.)
[20] 邓琨, 李文平, 陈丽 , 等. 一种新的基于标签传播的复杂网络重叠社区识别算法[J]. 控制与决策, 2020,35(11):2733-2742.
[20] ( Deng Kun, Li Wenping, Chen Li, et al. A Novel Algorithm for Overlapping Community Detection Based on Label Propagation in Complex Networks[J]. Control and Decision, 2020,35(11):2733-2742.)
[21] 吴清寿, 陈荣旺, 余文森 , 等. 融合标签预处理与节点影响力的重叠社区发现算法[J]. 计算机应用, 2020,40(12):3578-3585.
[21] ( Wu Qingshou, Chen Rongwang, Yu Wensen, et al. Overlapping Community Detection Algorithm Integrating Label Preprocessing and Node Influence[J]. Journal of Computer Applications, 2020,40(12):3578-3585.)
[22] 陈俊宇, 周刚, 熊小兵. 一种采用邻居投票机制的重叠社区发现方法[J]. 小型微型计算机系统, 2014,35(10):2272-2277.
[22] ( Chen Junyu, Zhou Gang, Xiong Xiaobing. Detecting Overlapping Community Structure with Neighbor Voting[J]. Journal of Chinese Computer Systems, 2014,35(10):2272-2277.)
[23] Nepusz T, Petróczi A, Négyessy L, et al. Fuzzy Communities and the Concept of Bridgeness in Complex Networks[J]. Physical Review E Statal Nonlinear & Soft Matter Physics, 2008,77(1):016107.
[24] Nicosia V, Mangioni G, Carchiolo V, et al. Extending the Definition of Modularity to Directed Graphs with Overlapping Communities[J]. Journal of Statal Mechanics Theory & Experiment, 2009(3):3166-3168.
[25] 涂存超, 杨成, 刘知远 , 等. 网络表示学习综述[J]. 中国科学:信息科学, 2017,47(8):980-996.
[25] ( Tu Cunchao, Yang Cheng, Liu Zhiyuan, et al. Network Representation Learning: An Overview[J]. Scientia Sinica (Informationis), 2017,47(8):980-996.)
[26] 韩忠明, 刘雯, 李梦琪 , 等. 基于节点向量表达的复杂网络社团划分算法[J]. 软件学报, 2019,30(4):1045-1061.
[26] ( Han Zhongming, Liu Wen, Li Mengqi, et al. Community Detection Algorithm Based on Node Embedding Vector Representation[J]. Journal of Software, 2019,30(4):1045-1061.)
[27] Rhouma D, Romdhane L B. An Efficient Algorithm for Community Mining with Overlap in Social Networks[J]. Expert Systems with Applications, 2014,41(9):4309-4321.
doi: 10.1016/j.eswa.2014.01.002
[28] 赵泉华, 李晓丽, 赵雪梅 , 等. 基于空间约束Student’s-T混合模型的模糊聚类图像分割[J]. 控制与决策, 2016,31(11):2065-2070.
[28] ( Zhao Quanhua, Li Xiaoli, Zhao Xuemei, et al. Fuzzy Clustering Algorithm Based on Spatially Constrained Student’s-T Mixture Model for Image Segmentation[J]. Control and Decision, 2016,31(11):2065-2070.)
[29] Rasmussen C E. The Infinite Gaussian Mixture Model[C]// Proceedings of the Advances in Neural Information Processing Systems 12. 2000: 554-560.
[30] Clauset A, Newman M E, Moore C. Finding Community Structure in Very Large Networks[J]. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 2004,70(6):066111.
doi: 10.1103/PhysRevE.70.066111
[31] Lancichinetti A, Fortunato S. Benchmarks for Testing Community Detection Algorithms on Directed and Weighted Graphs with Overlapping Communities[J]. Physical Review. E, Statistical,Nonlinear and Soft Matter Physics, 2009,80(2):016118.
doi: 10.1103/PhysRevE.80.016118
[32] van der Maaten L, Hinton G. Visualizing Data Using t-SNE[J]. Journal of Machine Learning Research, 2008,9(86):2579-2605.
[33] Conrad L, Fergal R, Aaron M, et al. Detecting Highly Overlapping Community Structure by Greedy Clique Expansion[OL]. arXiv Preprint, arXiv: 1002. 1827.
[34] Wu Z H, Lin Y F, Gregory S, et al. Balanced Multi-Label Propagation for Overlapping Community Detection in Social Networks[J]. Journal of Computer Science and Technology, 2012,27(3):468-479.
doi: 10.1007/s11390-012-1236-x
[35] Palla G, Derényi I, Farkas I, et al. Uncovering the Overlapping Community Structure of Complex Networks in Nature and Society[J]. Nature, 2005,435(7043):814.
doi: 10.1038/nature03607
[1] Zhang Xin,Wen Yi,Xu Haiyun. A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[2] Zhang Jinzhu, Yu Wenqian. Topic Recognition and Key-Phrase Extraction with Phrase Representation Learning[J]. 数据分析与知识发现, 2021, 5(2): 50-60.
[3] Yu Chuanming, Wang Manyi, Lin Hongjun, Zhu Xingyu, Huang Tingting, An Lu. A Comparative Study of Word Representation Models Based on Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[4] Li Wenzheng,Gu Yijun,Yan Hongli. Predicting Community Numbers with Network Bayesian Information Criterion[J]. 数据分析与知识发现, 2020, 4(4): 72-82.
[5] Zhang Chunjin,Guo Shenghui,Ji Shujuan,Yang Wei,Yi Lei. Group Recommendation Algorithms Based on Implicit Representation Learning of Multi-attribute Ratings[J]. 数据分析与知识发现, 2020, 4(12): 120-135.
[6] Ding Yong,Chen Xi,Jiang Cuiqing,Wang Zhao. Predicting Online Ratings with Network Representation Learning and XGBoost[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[7] Zhang Jinzhu,Zhu Lipeng,Liu Jingjie. Unsupervised Cross-Language Model for Patent Recommendation Based on Representation[J]. 数据分析与知识发现, 2020, 4(10): 93-103.
[8] Chuanming Yu,Haonan Li,Manyi Wang,Tingting Huang,Lu An. Knowledge Representation Based on Deep Learning:Network Perspective[J]. 数据分析与知识发现, 2020, 4(1): 63-75.
[9] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[10] Qingtian Zeng,Mingdi Dai,Chao Li,Hua Duan,Zhongying Zhao. Discovering Important Locations with User Representation and Trace Data[J]. 数据分析与知识发现, 2019, 3(6): 75-82.
[11] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[12] Xiang Li,Xiaodong Qian. Research on Impact of Commodity Online Evaluation for Consumption Convergence[J]. 数据分析与知识发现, 2019, 3(3): 102-111.
[13] Jinzhu Zhang,Yue Wang,Yiming Hu. Analyzing Sci-Tech Topics Based on Semantic Representation of Patent References[J]. 数据分析与知识发现, 2019, 3(12): 52-60.
[14] Jiao Yan,Jing Ma,Kang Fang. Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
[15] Wuxuan Jiang,Huixiang Xiong,Jiaxin Ye,Ning An. Creating Dynamic Tags for Social Networking Groups[J]. 数据分析与知识发现, 2019, 3(10): 98-109.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn