Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (11): 102-113     https://doi.org/10.11925/infotech.2096-3467.2021.0323
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
网络结构对链路预测算法的影响研究*——基于元分析视角
吴胜男,蒲虹君,田若楠,梁雯琪,于琦()
山西医科大学管理学院 太原 030001
Network Structure’s Impacts on Link Prediction Algorithm from Meta-Analysis Perspective
Wu Shengnan,Pu Hongjun,Tian Ruonan,Liang Wenqi,Yu Qi()
School of Management, Shanxi Medical University, Taiyuan 030001, China
全文: PDF (1215 KB)   HTML ( 9
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 在不同结构的网络中,各链路预测算法的预测效果存在不同程度的差异,通过对多项研究的数据进行统计分析,可以系统挖掘网络结构特征中影响链路预测结果的主要参数。【方法】 选取国内外关于链路预测的相关实证研究,最终纳入5篇文献、22个网络、26种算法和278项研究,利用三水平元分析和贝叶斯网络元分析方法探讨网络结构中影响链路预测结果的主要因素及其对各算法预测结果的影响。【结果】 纳入研究的算法总体预测的效应量MD=1.183 2(95%CI:(1.000 5,1.365 9)),网络密度、平均度和聚集系数是影响各算法预测效果的主要因素(Pval<0.05)。亚组分析结果表明:Katz、LHN-II、MFI、LRW、SRW等基于全局信息和准局部信息的链路预测算法在稀疏网络性能更佳,SUCRA值均大于0.5,在稠密网络中网络密度、网络平均度和聚集系数对各类算法的影响差异较大。【局限】 仅从统计学的角度进行分析,并未纳入大规模的文献数据进行进一步的实证分析,结果还较为粗糙。【结论】 本研究将元分析的概念引入复杂网络领域中,丰富了对网络结构与链路预测算法关系探讨的方法与视角,为未来网络结构对链路预测算法影响的相关研究提供新的思路。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
吴胜男
蒲虹君
田若楠
梁雯琪
于琦
关键词 链路预测算法预测性能网络结构三水平元分析贝叶斯网络元分析    
Abstract

[Objective] This paper tries to identify the main influencing parameters of the link prediction algorithms with the help of network structures and data from multiple studies. [Methods] We retrieved empirical research on link prediction from China and abroad, which include 5 papers, 22 networks, 26 algorithms and 278 studies. We used three-level meta-analysis and Bayesian network meta-analysis to explore the network structures and their impacts on algorithms’ performance. [Results] The algorithms included in our study generally had a good predictive effect MD=1.183 2 (95%CI: (1.000 5, 1.365 9)). The network density, average degree and clustering coefficient are the main factors affecting the prediction results (Pval<0.05). Katz, LHN-II, MFI, LRW, and SRW algorithms yielded better results with sparse networks and their SUCRA values were greater than 0.5. [Limitations] Our research does not include empirical analysis with large-scale data. [Conclusions] With the help of meta-analysis, our study explores the development directions for the link prediction algorithms.

Key wordsLink Prediction Algorithm    Prediction Accuracy    Network Structure    Three-Level Meta-Analysis    Bayesian Network Meta-Analysis
收稿日期: 2021-04-01      出版日期: 2021-12-23
ZTFLH:  G353  
基金资助:*国家自然科学基金青年项目(71804102);国家自然科学基金面上项目(71573162);山西省高等学校哲学社会科学研究项目(2019W040)
通讯作者: 于琦,ORCID:0000-0001-9154-0229     E-mail: yuqi@sxmu.edu.cn
引用本文:   
吴胜男, 蒲虹君, 田若楠, 梁雯琪, 于琦. 网络结构对链路预测算法的影响研究*——基于元分析视角[J]. 数据分析与知识发现, 2021, 5(11): 102-113.
Wu Shengnan, Pu Hongjun, Tian Ruonan, Liang Wenqi, Yu Qi. Network Structure’s Impacts on Link Prediction Algorithm from Meta-Analysis Perspective. Data Analysis and Knowledge Discovery, 2021, 5(11): 102-113.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.0323      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I11/102
序号 网络 节点数 边数 网络直径 网络密度 平均度 聚集系数 平均(最短)距离
1 Karate 34 78 5 0.139 0 4.588 0.588 2.408
2 Train Bombing 64 243 6 0.121 0 8.812 0.711 2.691
3 Net Science 379 914 17 0.013 0 4.823 0.798 6.042
4 US Airports 332 2 126 6 0.039 0 12.807 0.749 2.738
5 Karate(unweighted) 34 77 5 0.137 0 4.529 0.574 2.426
6 Dolphins 62 159 6 0.084 0 5.129 0.290 3.111
7 Football 115 613 4 0.094 0 10.661 0.403 2.508
8 C.elegan 297 2 148 5 0.049 0 14.465 0.308 2.455
9 FWFW 128 2 075 3 0.255 3 32.422 0.335 1.776
10 Polblogs 1 490 19 025 8 0.015 0 22.436 0.360 2.738
11 Jazz 198 2 742 6 0.140 6 27.700 0.618 2.235
12 中国文学 91 163 8 0.039 8 3.580 0.570 4.160
13 历史学 24 38 2 0.137 7 3.170 0.910 1.860
14 法学 56 69 7 0.044 0 2.460 0.430 3.850
15 教育学 2 556 3 647 44 0.001 1 2.850 0.570 16.180
16 图书馆情报文献学 9 471 19 104 22 0.000 4 4.030 0.680 7.340
17 经济学 14 900 23 746 26 0.000 2 3.180 0.590 9.330
18 管理学 10 878 18 360 27 0.000 3 3.380 0.690 9.990
19 Citeseer 3 312 4 732 28 0.001 0 2.857 0.257 9.036
20 DBLP 3 119 39 516 14 0.008 0 25.339 0.259 4.199
21 Cora 2 708 5 429 19 0.001 0 4.010 0.293 6.310
22 Wiki 2 405 17 981 9 0.006 0 14.953 0.480 3.650
Table 1  纳入研究的网络数据集
主效应 k 主效应及CI 异质性检验
MD 95%CI 标准误差 I2/% Q Pval
总效应 278 1.183 2 (1.000 5,1.365 9) 0.092 8 0.333/98.77% 12 812.024 7 <0.000 1
水平2 28 1.213 4 (1.033 1,1.393 7) 0.091 6 0.108/32.14% —— <0.000 1
水平3 278 1.231 5 (1.168 9,1.294 2) 0.031 8 0.225/66.64% —— <0.000 1
Table 2  主效应与异质性
截距 95%CI t Pval
-5.697 0 (-7.461 0,3.933 0) -6.212 0 0.000 0
Table 3  Egger’s检验结果
Fig.1  剪补后的漏斗图
调节变量 回归系数 标准误差 t Pval CI
截距 0.593 0 0.323 2 1.834 9 0.067 6 (-0.043 3,1.229 2)
节点 0.000 0 0.000 0 -0.120 2 0.904 4 (-0.000 1,0.000 1)
0.000 0 0.000 0 -1.275 5 0.203 2 (0.000 0,0.000 0)
网络直径 -0.010 0 0.049 4 -0.203 3 0.839 1 (-0.107 2,0.087 1)
网络密度 -7.271 8 1.860 6 -3.908 4 0.000 1 (-10.934 8,-3.608 7)
平均度 0.034 9 0.013 8 2.522 5 0.012 2 (0.007 7,0.062 2)
聚集系数 1.297 6 0.437 0 2.969 7 0.003 2 (0.437 3,2.157 9)
平均(最短)距离 0.057 6 0.149 4 0.385 7 0.700 0 (-0.236 5,0.351 7)
Table 4  样本量与网络结构特征对异质性的影响
调节变量 分组
标准
研究
数量
合并MD值(95%CI) 标准误差 t Q Pval 水平2方差I2/% 水平3方差I2/%
网络密度 >0.06 54 0.777 4 (0.461 9,1.093 0) 0.157 3 4.941 3 1 517.564 1 <0.000 1 0.139/38.96 0.209/58.15
≤0.06 224 1.393 7 (1.237 5,1.549 9) 0.079 3 17.580 7 10 236.787 6 <0.000 1 0.101/49.45 0.100/48.81
平均度 >10.02 98 1.136 1 (0.895 6,1.376 6) 0.121 2 9.374 8 2 736.970 8 <0.000 1 0.076/33.95 0.146/64.70
≤10.02 180 1.212 6 (0.949 7,1.475 5) 0.133 2 9.103 2 8 971.691 6 <0.000 1 0.126/30.59 0.280/68.19
聚集系数 >0.49 102 1.306 0 (0.966 7,1.645 2) 0.171 0 7.635 4 6 769.819 0 <0.000 1 0.127/26.21 0.355/73.10
≤0.49 176 1.082 5 (0.911 1,1.253 9) 0.086 9 12.463 7 3 783.386 9 <0.000 1 0.094/47.87 0.098/49.76
Table 5  分层三水平元分析结果
Fig.2  网络密度对各算法预测精度的影响
Fig.3  平均度对各算法预测精度的影响
Fig.4  聚集系数对各算法预测精度的影响
[1] Watts D J, Strogatz S H. Collective Dynamics of ‘Small-World’ Networks[J]. Nature, 1998, 393(6684):440-442.
doi: 10.1038/30918
[2] Chen S Y, Huang W, Cattani C, et al. Traffic Dynamics on Complex Networks: A Survey[J]. Mathematical Problems in Engineering, 2012: Article No.732698.
[3] 张彦超, 刘云, 张海峰, 等. 基于在线社交网络的信息传播模型[J]. 物理学报, 2011, 60(5):66-72.
[3] (Zhang Yanchao, Liu Yun, Zhang Haifeng, et al. The Research of Information Dissemination Model on Online Social Network[J]. Acta Physica Sinica, 2011, 60(5):66-72.)
[4] 吕琳媛. 复杂网络链路预测[J]. 电子科技大学学报, 2010, 39(5):651-661.
[4] (Lv Linyuan. Link Prediction on Complex Networks[J]. Journal of University of Electronic Science and Technology of China, 2010, 39(5):651-661.)
[5] Mamitsuka H. Mining from Protein-Protein Interactions[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2012, 2(5):400-410.
doi: 10.1002/widm.1065
[6] 张斌, 马费成. 科学知识网络中的链路预测研究述评[J]. 中国图书馆学报, 2015, 41(3):99-113.
[6] (Zhang Bin, Ma Feicheng. A Review on Link Prediction of Scientific Knowledge Network[J]. Journal of Library Science in China, 2015, 41(3):99-113.)
[7] Kossinets G. Effects of Missing Data in Social Networks[J]. Social Networks, 2006, 28(3):247-268.
doi: 10.1016/j.socnet.2005.07.002
[8] Ren X L, Lv L, Liu R R, et al. Avoiding Congestion in Recommender Systems[J]. New Journal of Physics, 2014, 16(6):063057.
doi: 10.1088/1367-2630/16/6/063057
[9] Ahmed H S, Faouzi B M, Caelen J. Detection and Classification of the Behavior of People in an Intelligent Building by Camera[J]. International Journal on Smart Sensing and Intelligent Systems, 2013, 6(4):1317-1342.
doi: 10.21307/ijssis-2017-592
[10] Kaya B, Poyraz M. Unsupervised Link Prediction in Evolving Abnormal Medical Parameter Networks[J]. International Journal of Machine Learning and Cybernetics, 2016, 7(1):145-155.
doi: 10.1007/s13042-015-0405-y
[11] 马代川, 罗代兵. 基于网络链路预测的药物分子重定位研究[J]. 化学研究与应用, 2020, 32(5):752-757.
[11] (Ma Daichuan, Luo Daibing. Link Prediction Algorithm for Drug Repositioning Based on Structural Feature[J]. Chemical Research and Application, 2020, 32(5):752-757.)
[12] 吕琳媛, 任晓龙, 周涛. 网络链路预测: 概念与前沿[J]. 中国计算机学会通讯, 2016, 12(4):12-19.
[12] (Lv Linyuan, Ren Xiaolong, Zhou Tao. Network Link Prediction: Concepts and Frontiers[J]. Communications of CCF, 2016, 12(4):12-19.)
[13] Zhou T, Lv L, Zhang Y C. Predicting Missing Links via Local Information[J]. The European Physical Journal B, 2009, 71(4):623-630.
doi: 10.1140/epjb/e2009-00335-8
[14] Ahn M W, Jung W S. Accuracy Test for Link Prediction in Terms of Similarity Index: The Case of WS and BA Models[J]. Physica A: Statistical Mechanics and Its Applications, 2015, 429:177-183.
doi: 10.1016/j.physa.2015.01.083
[15] Gupta A K, Sardana N. Impact of Topological Properties over Link Prediction Based on Node Neighbourhood: A Study[C]// Proceedings of the 7th International Conference on Contemporary Computing (IC3). IEEE, 2014: 194-198.
[16] Feng X, Zhao J C, Xu K. Link Prediction in Complex Networks: A Clustering Perspective[J]. The European Physical Journal B, 2012, 85:3.
doi: 10.1140/epjb/e2011-20207-x
[17] 贾珺, 胡晓峰, 贺筱媛. 网络结构特征与链路预测算法关系研究[J]. 复杂系统与复杂性科学, 2017, 14(1):28-37.
[17] (Jia Jun, Hu Xiaofeng, He Xiaoyuan. On the Relationship Between Network Structure Features and Link Prediction Algorithms[J]. Complex Systems and Complexity Science, 2017, 14(1):28-37.)
[18] Liu M M, Guo J F, Luo X. Link Prediction Based on the Similarity of Transmission Nodes of Multiple Paths in Weighted Social Networks[J]. Journal of Information Hiding and Multimedia Signal Processing, 2016, 7(4):771-780.
[19] 陈佳璐, 钱宇华, 张晓琴, 等. 依据节点贡献的链路预测方法[J]. 小型微型计算机系统, 2016, 37(1):92-95.
[19] (Chen Jialu, Qian Yuhua, Zhang Xiaoqin, et al. Link Prediction Method According to Node Contribution[J]. Journal of Chinese Computer Systems, 2016, 37(1):92-95.)
[20] 张斌, 李亚婷, 戴怡清. 学科合作网络的链路挖掘与应用分析[J]. 情报理论与实践, 2018, 41(9):108-113.
[20] (Zhang Bin, Li Yating, Dai Yiqing. Link Mining and Application Analysis of Disciplinary Collaboration Network[J]. Information Studies: Theory & Application, 2018, 41(9):108-113.)
[21] Yang Y L, Ye Z L, Zhao H X, et al. Link Prediction Based on Gravitational Field of Complex Network[C]// Proceedings of 2019 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress. IEEE, 2019: 499-504.
[22] 杨燕琳, 冶忠林, 赵海兴, 等. 基于高阶近似的链路预测算法[J]. 计算机应用, 2019, 39(8):2366-2373.
[22] (Yang Yanlin, Ye Zhonglin, Zhao Haixing, et al. Link Prediction Algorithm Based on High-Order Proximity Approximation[J]. Journal of Computer Applications, 2019, 39(8):2366-2373.)
[23] 王慧, 乐孜纯, 龚轩, 等. 基于特征分类的链路预测方法综述[J]. 计算机科学, 2020, 47(8):302-312.
[23] (Wang Hui, Le Zichun, Gong Xuan, et al. Review of Link Prediction Methods Based on Feature Classification[J]. Computer Science, 2020, 47(8):302-312.)
[24] Newman M E. Clustering and Preferential Attachment in Growing Networks[J]. Physical Review E, Statistical, Nonlinear, and Soft Matter Physics, 2001, 64(2):025102.
doi: 10.1103/PhysRevE.64.025102
[25] Adamic L A, Adar E. Friends and Neighbors on the Web[J]. Social Networks, 2003, 25(3):211-230.
doi: 10.1016/S0378-8733(03)00009-1
[26] Barabási A L, Albert R. Emergence of Scaling in Random Networks[J]. Science, 1999, 286(5439):509-512.
doi: 10.1126/science.286.5439.509
[27] Katz L. A New Status Index Derived from Sociometric Analysis[J]. Psychometrika, 1953, 18(1):39-43.
doi: 10.1007/BF02289026
[28] Leicht E A, Holme P, Newman M E J. Vertex Similarity in Networks[J]. Physical Review E, 2006, 73(2):026120.
doi: 10.1103/PhysRevE.73.026120
[29] Lv L, Jin C H, Zhou T. Similarity Index Based on Local Paths for Link Prediction of Complex Networks[J]. Physical Review E, Statistical, Nonlinear, and Soft Matter Physics, 2009, 80(4):046122.
doi: 10.1103/PhysRevE.80.046122
[30] Liu W P, Lv L. Link Prediction Based on Local Random Walk[J]. EPL (EuroPhysics Letters), 2010, 89(5):58007.
doi: 10.1209/0295-5075/89/58007
[31] Symeonidis P, Tiakas E, Manolopoulos Y. Transitive Node Similarity for Link Prediction in Social Networks with Positive and Negative Links[C]// Proceedings of the 4th ACM Conference on Recommender Systems. 2010.
[32] Fredman M L, Tarjan R E. Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms[C]// Proceedings of the 25th Annual Symposium on Foundations of Computer Science. 1984. DOI: 10.1109/SFCS.1984.715934.
doi: 10.1109/SFCS.1984.715934
[33] Chebotarev P, Shamis E. The Matrix-Forest Theorem and Measuring Relations in Small Social Groups[J]. Automation & Remote Control, 2006, 58(9):1505-1514.
[34] 王润芳, 陈增强, 刘忠信. 融合朴素贝叶斯方法的复杂网络链路预测[J]. 智能系统学报, 2019, 14(1):99-107.
[34] (Wang Runfang, Chen Zengqiang, Liu Zhongxin. Link Prediction in Complex Networks with Syncretic Naive Bayes Methods[J]. CAAI Transactions on Intelligent Systems, 2019, 14(1):99-107.)
[35] Gupta A K, Sardana N. Significance of Clustering Coefficient over Jaccard Index[C]// Proceedings of the 8th International Conference on Contemporary Computing. 2015: 463-466.
[36] 张斌, 李亚婷, 戴怡清. 聚集系数对合著网络链路预测效果的影响研究[J]. 情报理论与实践, 2018, 41(1):100-104.
[36] (Zhang Bin, Li Yating, Dai Yiqing. Research on the Influence of Clustering Coefficient on the Link Prediction in Collaboration Networks[J]. Information Studies: Theory & Application, 2018, 41(1):100-104.)
[37] Yan E J, Guns R. Predicting and Recommending Collaborations: An Author-, Institution-, and Country-Level Analysis[J]. Journal of Informetrics, 2014, 8(2):295-309.
doi: 10.1016/j.joi.2014.01.008
[38] Assink M, Wibbelink C J M. Fitting Three-Level Meta-Analytic Models in R: A Step-by-Step Tutorial[J]. The Quantitative Methods for Psychology, 2016, 12(3):154-174.
doi: 10.20982/tqmp.12.3.p154
[39] Glass G V. Primary, Secondary, and Meta-Analysis of Research[J]. Educational Research, 1976, 5(10):3-8.
[40] 刘志辉, 李文绚. 元分析方法研究综述[J]. 情报工程, 2016, 2(6):31-38.
[40] (Liu Zhihui, Li Wenxuan. Literature Review on Meta-Analysis[J]. Technology Intelligence Engineering, 2016, 2(6):31-38.)
[41] 龙雄伟. 复杂网络视角下我国产业关联结构研究[J]. 商业经济研究, 2017(20):180-182.
[41] (Long Xiongwei. Study on the Industrial Correlation Structure in China from the Aspects of Complex Network[J]. Journal of Commercial Economics, 2017 (20):180-182.)
[42] 沈旭慧, 王珍. 非比较研究及前后对照研究结局变量Meta分析的效应量选择[J]. 中国组织工程研究与临床康复, 2011, 15(4):733-736.
[42] (Shen Xuhui, Wang Zhen. Effect Size Selection for Meta Analysis of Outcome Variables in Noncomparative Studies and Pre-post Contrasts Studies[J]. Journal of Clinical Rehabilitative Tissue Engineering Research, 2011, 15(4):733-736.)
[43] Borenstein M, Hedges L V, Higgins J P T, et al. Introduction to Meta-Analysis[M]. Chichester, UK: John Wiley & Sons, Ltd, 2009.
[44] van Valkenhoef G, Lu G B, de Brock B, et al. Automating Network Meta-Analysis[J]. Research Synjournal Methods, 2012, 3(4):285-299.
[45] 沃特·德·诺伊, 安德烈·姆尔瓦, 弗拉迪米尔·巴塔盖尔吉, 等. 蜘蛛: 社会网络分析技术[M]. 林枫译. 北京: 世界图书出版公司, 2012.
[45] (Wouter D N, Andrej M, Vladimir B, et al. Exploratory Social Network Analysis with Pajek[M]. Translated by Lin Feng. Beijing: World Publishing Corporation, 2012.)
[46] 李欣, 温阳, 黄鲁成, 等. 多层网络分析视域下的新兴技术研发合作网络演化特征研究[J]. 情报杂志, 2021, 40(1):62-70.
[46] (Li Xin, Wen Yang, Huang Lucheng, et al. Research on Evolution Characteristics of Emerging Technology R&D Cooperation Network from the Perspective of Multi-level Network Analysis[J]. Journal of Intelligence, 2021, 40(1):62-70.)
[47] 刘建香. 复杂网络及其在国内研究进展的综述[J]. 系统科学学报, 2009, 17(4):31-37.
[47] (Liu Jianxiang. Complex Network and Review of Domestic Research[J]. Chinese Journal of Systems Science, 2009, 17(4):31-37.)
[1] 王松, 杨洋, 刘新民. 基于图注意力网络的开放式创新社区用户创意潜在价值发现研究*[J]. 数据分析与知识发现, 2021, 5(11): 89-101.
[2] 曾庆田,胡晓慧,李超. 融合主题词嵌入和网络结构分析的主题关键词提取方法 *[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[3] 黄炜,余辉,李岳峰. 国内网络反恐研究的现状、问题和展望*[J]. 现代图书情报技术, 2016, 32(11): 1-10.
[4] 夏立新,谭荧. LOD的网络结构分析与可视化*[J]. 现代图书情报技术, 2016, 32(1): 65-72.
[5] 王咏,倪波,袁勤俭,朱为勇,梁伟. 20世纪网络通信业的发展——IT技术世纪回眸之四[J]. 现代图书情报技术, 2000, 16(6): 14-17.
[6] 钱华林. 中国科技网的发展[J]. 现代图书情报技术, 1997, 13(3): 3-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn