Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (5): 41-50     https://doi.org/10.11925/infotech.2096-3467.2020.1208
     研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于节点向量表示的模糊重叠社区划分算法*
陈文杰(),文奕,杨宁
中国科学院成都文献情报中心 成都 610041
Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation
Chen Wenjie(),Wen Yi,Yang Ning
Chengdu Library and Information Center, Chinese Academy of Sciences, Chengdu 610041, China
全文: PDF (1290 KB)   HTML ( 13
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对现有模糊重叠社区划分算法执行效率较差和准确度较低的问题,提出一种基于节点向量表示的模糊社区划分算法。【方法】 使用由节点重要性引导的随机游走策略生成节点序列,将节点序列视作语料库中的句子,利用Skip-gram模型训练得到节点向量,并将高斯混合模型引入模糊社区划分算法FCM(Fuzzy c-Means)中实现多峰值节点数据拟合,通过最大化模块度得到最佳的社区数目。【结果】 相比经典的社区划分方法,该算法在真实网络Jazz和人工网络N1mu=0.5)上的EQ值分别提高了7.0%和9.7%,能够更准确地划分出网络中的社区结构。【局限】 在向量的表示学习中仅考虑复杂网络的拓扑结构信息,而忽略了节点属性信息和边上标签信息。【结论】 基于节点向量表示的模糊重叠社区划分算法可以有效完成复杂网络的社区划分任务。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
陈文杰
文奕
杨宁
关键词 复杂网络社区结构表示学习随机游走    
Abstract

[Objective] This paper proposed a fuzzy community partition algorithm based on node vector representation,aiming to solve the problems of poor efficiency and accuracy of existing fuzzy overlapping community partition algorithms. [Methods] Firstly, the random walk strategy guided by node importance is used to generate the walk sequence, and then the skip-gram model is used to train the node vector. Then, the Gaussian mixture model is introduced into the community partition to realize the multi peak node data fitting. Finally, the optimal number of communities is obtained by maximizing the modularity. [Results] Compared with the classical community detection method, the EQ values of the algorithm on the real network jazz and artificial network N1 (mu = 0.5) are increased by 7.0% and 9.7% respectively, which can more accurately detect the community structure in the network. [Limitations] In the vector representation learning, only the topological structure information of complex network is considered, while the node attribute information and edge label information are ignored. [Conclusions] The fuzzy overlapping community detection algorithm based on node vector representation can effectively complete the community division task of complex network.

Key wordsComplex Network    Community Structure    Representation Learning    Random Walk
收稿日期: 2020-12-03      出版日期: 2021-05-27
ZTFLH:  TP393  
基金资助:*本文系中国科学院“十三五”信息化专项的研究成果之一(XXH13506)
通讯作者: 陈文杰     E-mail: chenwj@clas.ac.cn
引用本文:   
陈文杰,文奕,杨宁. 基于节点向量表示的模糊重叠社区划分算法*[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
Chen Wenjie,Wen Yi,Yang Ning. Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation. Data Analysis and Knowledge Discovery, 2021, 5(5): 41-50.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.1208      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I5/41
Fig.1  FONV算法流程
Fig.2  Skip-gram模型(c=2)
网络名 节点数 边数
Football 115 613
Jazz 198 5 483
Netscience 1 461 2 742
Table 1  真实网络
参数 N1 N2
节点数目 500 5 000
节点平均度 30 200
节点最大度 60 500
社区最小节点数目 50 300
社区最大节点数目 100 1 000
混合参数mu 0~1 0~1
Table 2  人工网络
Fig.3  节点向量可视化
算法 Football Jazz Netscience N1 N2
NiWalk2Vec 1.71s 3.25s 34.14s 9.31s 5.96min
NMF 0.56s 1.01s 101.35s 2.76s 16.51min
Spectral Mapping 0.06s 0.09s 86.92s 0.58s 32.82min
Table 3  平均运行时间对比
算法 Football Jazz Netscience
GCE 59.5% 28.2% 54.7%
COPRA 60.3% 41.4% 74.9%
BMLPA 60.1% 20.6% 71.4%
CPM 35.7% 9.4% 75.4%
G-FCM 60.4% 44.3% 79.2%
Table 4  EQ值对比
Fig.4  人工网络上的EQ和NMI
算法 EQ NMI 运行时间
NiWalk2Vec 35.1% 100% 9.31s
DeepWalk 33.4% 98.2% 7.46s
Table 5  NiWalk2Vec和DeepWalk的对比
向量维度 EQ NMI
5 28.9% 77.1%
10 32.8% 96.5%
20 33.0% 99.3%
50 35.1% 100%
100 30.7% 94.7%
Table 6  EQ和NMI随向量维度变化
[1] 赵卫绩, 张凤斌, 刘井莲. 复杂网络社区发现研究进展[J]. 计算机科学, 2020,47(2):10-20.
[1] ( Zhao Weiji, Zhang Fengbin, Liu Jinglian. Review on Community Detection in Complex Networks[J]. Computer Science, 2020,47(2):10-20.)
[2] 肖婧, 张永建, 许小可. 复杂网络模糊重叠社区检测研究进展[J]. 复杂系统与复杂性科学, 2017,14(3):8-29.
[2] ( Xiao Jing, Zhang Yongjian, Xu Xiaoke. Research Progress of Fuzzy Overlapping Community Detection in Complex Networks[J]. Complex Systems and Complexity Science, 2017,14(3):8-29.)
[3] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of Advances in Neural Information Processing Systems. 2013: 3111-3119.
[4] Perozzi B, Alrfou R, Skiena S. DeepWalk: Online Learning of Social Representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Detection and Data Mining. 2014: 701-710.
[5] Bezdek J C, Ehrlich R, Full W. FCM: The Fuzzy c-Means Clustering Algorithm[J]. Computers & Geoences, 1984,10(2/3):191-203.
[6] Yu S X, Shi J. Multiclass Spectral Clustering[C]// Proceedings of the 9th IEEE International Conference on Computer Vision. IEEE Computer Society, 2003.
[7] Zhang S, Wang R S, Zhang X S. Identification of Overlapping Community Structure in Complex Networks Using Fuzzy c-means Clustering[J]. Physica A: Statistical Mechanics and Its Applications, 2007,374(1):483-490.
doi: 10.1016/j.physa.2006.07.023
[8] Wang W J, Liu D, Liu X, et al. Fuzzy Overlapping Community Detection Based on Local Random Walk and Multidimensional Scaling[J]. Physica A: Statistical Mechanics and Its Applications, 2013,392(24):6578-6586.
doi: 10.1016/j.physa.2013.08.028
[9] 孙延维, 彭智明, 李健波. 基于粒子群优化与模糊聚类的社区发现算法[J]. 重庆邮电大学学报(自然科学版), 2015,27(5):660-666.
[9] ( Sun Weiyan, Peng Zhiming, Li Jianbo. Community Detection Algorithm Based on Particle Swarm Optimization and Fuzzy Clustering[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2015,27(05):660-666.)
[10] 付饶, 孟凡荣, 邢艳. 基于节点重要性与相似性的重叠社区发现算法[J]. 计算机工程, 2018,44(9):192-198.
[10] ( Fu Rao, Meng Fanrong, Xing Yan. Overlapping Community Discovery Algorithm Based on Node Importance and Similarity[J]. Computer Engineering, 2018,44(9):192-198.)
[11] 汪晓锋, 刘功申, 李建华. 基于模糊聚类的多分辨率社区发现方法[J]. 电子与信息学报, 2017,39(9):2033-2039.
[11] ( Wang Xiaofeng, Liu Gongshen, Li Jianhua. Multiresolution Community Detection Based on Fuzzy Clustering[J]. Journal of Electronics & Information Technology, 2017,39(9):2033-2039.)
[12] 段忠祥. 复杂网络模糊重叠社区结构自动检测仿真[J]. 计算机仿真, 2020,37(6):352-356.
[12] ( Duan Zhongxiang. Automatic Detection and Simulation of Complex Network Fuzzy Overlapping Community Structure[J]. Computer Simulation, 2020,37(6):352-356.)
[13] Lee D D, Seung H S. Learning the Parts of Objects by Non-Negative Matrix Factorization[J]. Nature, 1999,401:788-791.
pmid: 10548103
[14] Zhang S H, Wang R S, Zhang X S. Uncovering Fuzzy Community Structure in Complex Networks[J]. Physical Review E Statal Nonlinear & Soft Matter Physics, 2007,76(4):046103.
[15] Psorakis I, Roberts S, Ebden M, et al. Overlapping Community Detection Using Bayesian Non-Negative Matrix Factorization[J]. Physical Review E Statal Nonlinear & Soft Matter Physics, 2011,83(6):066114.
[16] 李玉翔, 李弼程, 郭志刚. 基于非负矩阵分解的网络重叠社区发现研究[J]. 系统仿真学报, 2014,26(3):643-649.
[16] ( Li Yuxiang, Li Bicheng, Guo Zhigang. Research on Overlapping Community Detection in Networks Using Non-negative Matrix Factorization[J]. Journal of System Simulation, 2014,26(3):643-649.)
[17] Ye F H, Chen C, Zheng Z B, et al. Discrete Overlapping Community Detection with Pseudo Supervision[C]// Proceedings of 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 2020.
[18] Gregory S. Finding Overlapping Communities in Networks by Label Propagation[J]. New Journal of Physics, 2009,12(10):2011-2024.
[19] 刘世超, 朱福喜, 甘琳. 基于标签传播概率的重叠社区发现算法[J]. 计算机学报, 2016,39(4):717-729.
[19] ( Liu Shichao, Zhu Fuxi, Gan Lin. A Label-Propagation-Probability-Based Algorithm for Overlapping Community Detection[J]. Chinese Journal of Computers, 2016,39(4):717-729.)
[20] 邓琨, 李文平, 陈丽 , 等. 一种新的基于标签传播的复杂网络重叠社区识别算法[J]. 控制与决策, 2020,35(11):2733-2742.
[20] ( Deng Kun, Li Wenping, Chen Li, et al. A Novel Algorithm for Overlapping Community Detection Based on Label Propagation in Complex Networks[J]. Control and Decision, 2020,35(11):2733-2742.)
[21] 吴清寿, 陈荣旺, 余文森 , 等. 融合标签预处理与节点影响力的重叠社区发现算法[J]. 计算机应用, 2020,40(12):3578-3585.
[21] ( Wu Qingshou, Chen Rongwang, Yu Wensen, et al. Overlapping Community Detection Algorithm Integrating Label Preprocessing and Node Influence[J]. Journal of Computer Applications, 2020,40(12):3578-3585.)
[22] 陈俊宇, 周刚, 熊小兵. 一种采用邻居投票机制的重叠社区发现方法[J]. 小型微型计算机系统, 2014,35(10):2272-2277.
[22] ( Chen Junyu, Zhou Gang, Xiong Xiaobing. Detecting Overlapping Community Structure with Neighbor Voting[J]. Journal of Chinese Computer Systems, 2014,35(10):2272-2277.)
[23] Nepusz T, Petróczi A, Négyessy L, et al. Fuzzy Communities and the Concept of Bridgeness in Complex Networks[J]. Physical Review E Statal Nonlinear & Soft Matter Physics, 2008,77(1):016107.
[24] Nicosia V, Mangioni G, Carchiolo V, et al. Extending the Definition of Modularity to Directed Graphs with Overlapping Communities[J]. Journal of Statal Mechanics Theory & Experiment, 2009(3):3166-3168.
[25] 涂存超, 杨成, 刘知远 , 等. 网络表示学习综述[J]. 中国科学:信息科学, 2017,47(8):980-996.
[25] ( Tu Cunchao, Yang Cheng, Liu Zhiyuan, et al. Network Representation Learning: An Overview[J]. Scientia Sinica (Informationis), 2017,47(8):980-996.)
[26] 韩忠明, 刘雯, 李梦琪 , 等. 基于节点向量表达的复杂网络社团划分算法[J]. 软件学报, 2019,30(4):1045-1061.
[26] ( Han Zhongming, Liu Wen, Li Mengqi, et al. Community Detection Algorithm Based on Node Embedding Vector Representation[J]. Journal of Software, 2019,30(4):1045-1061.)
[27] Rhouma D, Romdhane L B. An Efficient Algorithm for Community Mining with Overlap in Social Networks[J]. Expert Systems with Applications, 2014,41(9):4309-4321.
doi: 10.1016/j.eswa.2014.01.002
[28] 赵泉华, 李晓丽, 赵雪梅 , 等. 基于空间约束Student’s-T混合模型的模糊聚类图像分割[J]. 控制与决策, 2016,31(11):2065-2070.
[28] ( Zhao Quanhua, Li Xiaoli, Zhao Xuemei, et al. Fuzzy Clustering Algorithm Based on Spatially Constrained Student’s-T Mixture Model for Image Segmentation[J]. Control and Decision, 2016,31(11):2065-2070.)
[29] Rasmussen C E. The Infinite Gaussian Mixture Model[C]// Proceedings of the Advances in Neural Information Processing Systems 12. 2000: 554-560.
[30] Clauset A, Newman M E, Moore C. Finding Community Structure in Very Large Networks[J]. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 2004,70(6):066111.
doi: 10.1103/PhysRevE.70.066111
[31] Lancichinetti A, Fortunato S. Benchmarks for Testing Community Detection Algorithms on Directed and Weighted Graphs with Overlapping Communities[J]. Physical Review. E, Statistical,Nonlinear and Soft Matter Physics, 2009,80(2):016118.
doi: 10.1103/PhysRevE.80.016118
[32] van der Maaten L, Hinton G. Visualizing Data Using t-SNE[J]. Journal of Machine Learning Research, 2008,9(86):2579-2605.
[33] Conrad L, Fergal R, Aaron M, et al. Detecting Highly Overlapping Community Structure by Greedy Clique Expansion[OL]. arXiv Preprint, arXiv: 1002. 1827.
[34] Wu Z H, Lin Y F, Gregory S, et al. Balanced Multi-Label Propagation for Overlapping Community Detection in Social Networks[J]. Journal of Computer Science and Technology, 2012,27(3):468-479.
doi: 10.1007/s11390-012-1236-x
[35] Palla G, Derényi I, Farkas I, et al. Uncovering the Overlapping Community Structure of Complex Networks in Nature and Society[J]. Nature, 2005,435(7043):814.
doi: 10.1038/nature03607
[1] 张鑫,文奕,许海云. 一种融合表示学习与主题表征的作者合作预测模型*[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[2] 张金柱, 于文倩. 基于短语表示学习的主题识别及其表征词抽取方法研究[J]. 数据分析与知识发现, 2021, 5(2): 50-60.
[3] 余传明, 张贞港, 孔令格. 面向链接预测的知识图谱表示模型对比研究*[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[4] 余传明, 王曼怡, 林虹君, 朱星宇, 黄婷婷, 安璐. 基于深度学习的词汇表示模型对比研究*[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[5] 李文政,顾益军,闫红丽. 基于网络贝叶斯信息准则算法的社区数量预测研究*[J]. 数据分析与知识发现, 2020, 4(4): 72-82.
[6] 余传明,钟韵辞,林奥琛,安璐. 基于网络表示学习的作者重名消歧研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 48-59.
[7] 丁勇,陈夕,蒋翠清,王钊. 一种融合网络表示学习与XGBoost的评分预测模型*[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[8] 张金柱,主立鹏,刘菁婕. 基于表示学习的无监督跨语言专利推荐研究*[J]. 数据分析与知识发现, 2020, 4(10): 93-103.
[9] 关鹏,王曰芬. 国内外专利网络研究进展*[J]. 数据分析与知识发现, 2020, 4(1): 26-39.
[10] 余传明,李浩男,王曼怡,黄婷婷,安璐. 基于深度学习的知识表示研究:网络视角*[J]. 数据分析与知识发现, 2020, 4(1): 63-75.
[11] 曾庆田,胡晓慧,李超. 融合主题词嵌入和网络结构分析的主题关键词提取方法 *[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[12] 曾庆田,戴明弟,李超,段华,赵中英. 轨迹数据融合用户表示方法的重要位置发现*[J]. 数据分析与知识发现, 2019, 3(6): 75-82.
[13] 张金柱,胡一鸣. 融合表示学习与机器学习的专利科学引文标题自动抽取研究*[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[14] 李想,钱晓东. 商品在线评价对消费趋同影响研究*[J]. 数据分析与知识发现, 2019, 3(3): 102-111.
[15] 张金柱,王玥,胡一鸣. 基于专利科学引文内容表示学习的科学技术主题关联分析研究 *[J]. 数据分析与知识发现, 2019, 3(12): 52-60.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn