Liu Chunjiang1,2(),Li Shuying1,Hu Hanlin3,Fang Shu1,2
1Chengdu Library and Information Center, Chinese Academy of Sciences, Chengdu 610041, China 2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China 3School of Public Administration, Sichuan School of Economics and Management, University, Chengdu 610065, China
【目的】作为存储网络数据的主流工具,图数据库在复杂网络分析中的研究与应用不断丰富,本文系统梳理了图数据库在复杂网络领域的应用进展和研究趋势。【文献范围】 以Web of Science核心数据库、Scopus、CNKI数据库为检索中英文文献的来源,调研了相关文献中涉及的15个图数据库及开源网站,详细整理了21个应用案例,精读综述了14篇研究论文。【方法】对比分析国内外主流图数据库,尝试探讨最新的图数据库解决方案在复杂网络分析中的应用,包括中心性、路径查找、链路预测、社区检测和图可视化等。【结果】图数据库已经成为复杂网络分析与大数据挖掘的重要分析工具与研究手段,不仅是复杂网络分析的一站式解决方案,还与图计算引擎等工具结合使用。【局限】 图数据库应用场景非常多,本文未能完整覆盖,仅选取2~3个有代表性的案例进行阐述。【结论】图数据库对于查询、表示和挖掘网络数据效果显著,能较为直观地分析和发现图结构中有意义的模式或结构,其对数据密集型的多维特征的呈现更接近现实,是未来挖掘隐含关系的重要工具。
[Objective] This paper systematically reviews the progress and trends of graph database research and applications for complex network analysis. [Coverage] We searched the Web of Science, Scopus, and CNKI database for Chinese and English literature. A total of 15 graph databases and open-source packages, 21 practical cases, and 14 research papers were retrieved. [Methods] First, we compared the mainstream graph database products from China and abroad. Then, we explored the latest solutions for complex network analysis, including algorithms (such as centrality, path finding, link prediction, and community detection), graph visualization, performance and related applications. [Results] The graph database has become an important analysis tool and research method for complex network analysis and big data mining. They also work closely with graph computing engines for complex network analysis. [Limitations] This paper only examined a few representative cases. [Conclusions] The graph database could effectively query, represent and analyze complex network data for their patterns or structures. Their presentation of multi-dimensional data is crucial for mining implicit relationships.
支持SB-Tree、Hash、Lucene Full Text、Lucene Spatial等4种索引算法
Table 3 国外主流图数据库
应用场景
类型
图算法/图布局
具体描述
中心性分析[25]
基于相邻节点
PageRank、ArticleRank、Eigenvector Centrality等
基于相邻节点的重要性计算当前节点的重要性
基于节点自身
度中心性(Degree Centrality)
基于与相邻节点直接相连的关系数量计算当前节点的重要性
接近中心性(Closeness Centrality)
基于与其他节点之间的平均距离计算当前节点的重要性
中介中心性(Between Centrality)
基于当前节点出现在任意两个节点的最短路径中的次数计算当前节点的重要性
路径查找[26]
路径遍历
广度优先搜索(Breadth First Search)
通过逐层遍历所有相邻节点从而找到最短路径
深度优先搜索(Depth First Search)
对起始节点的所有分支逐个遍历
固定路径查找
最短路径(Dijkstra、A*、Yen’s)
计算两个节点之间的最短路径
所有节点对最短路径(All Pairs Shortest Path)
计算所有节点之间的最短路径
单源最短路径(Single Source Shortest Path)
计算源节点与其他可达节点之间的最短路径
最小权重生成树(Minimum Weight Spanning Tree)
计算生成树中边权值和最小情况下的路径
非固定路径查找
随机游走(Random Walk)
在指定路径长度内,计算从一个节点开始按照随机或非随机的方式选择下一个节点的路径
链接预测[26]
基于邻居节点
所有邻居(Total Neighbors)
基于两个节点的邻居节点集合并集,计算两个节点的紧密度
连接偏好(Preferential Attachment)
对节点连接数有偏好,因此将两个节点的邻居节点集合数量相乘,计算两个节点的紧密度
基于共有邻居
共有邻居(Common Neighbors)
基于两个节点的邻居节点集合交集,计算两个节点的紧密度
资源优化(Resource Allocation)
基于共有邻居的相邻节点集合,并对集合数量进行非线性归一化处理,计算两个节点的紧密度
AA(Adamic Adar)
基于共有邻居的相邻节点集合,未对集合数量进行非线性归一化处理,计算两个节点的紧密度
基于共有社区
共有社区(Same Community)
基于社区检测算法判断两个节点是否属于相同社区,计算两个节点的紧密度
社区检测[27]
基于模块度
鲁汶(Louvain)
基于社区的模块度进行社区划分
基于网络动力学
标签传播(Label Propagation)
基于节点标签的传播进行社区划分
基于簇
弱联通社区(Weakly Connected Components)
在无向图中将任意节点间均存在路径的集合形成簇,计算出社区中所有的簇
强联通簇(Strongly Connected Components)
在有向图中将任意节点间均存在双向路径的集合形成簇,计算出社区中所有的簇
基于三角形
三角计数(Triangle Count)
计算社区中相连成三角形的节点集合
局部聚类系数(Local Clustering Coefficient)
基于节点的三角形和度这两个数值,计算节点的聚类系数
图可视化[28]
集成可视化
图布局类型比较简单,以力导向布局为主
通过图数据库集成的内置可视化工具
在线可视化
图布局类型最为丰富,包括力导向、地图、圆形、 时序、树状等布局
通过Web前端的JavaScript可视化工具
Table 4 复杂网络分析领域中的重要图算法/图布局
应用场景
主要算法
图数据库
预测重大疾病保险 欺诈[29]
度中心性、中介中心性、接近中心性、特征向量中心性
Neo4j
面向科技与能力的 网络分析[30]
度中心性、紧密中心性、中介中心性、特征向量中心性
HugeGraph
分析社交网络中心 节点[31]
中心性、PageRank
OrientDB
分析药品处方模式[32]
度中心性、中介中心性
Neo4j
侦查银行欺诈问题[33]
中心性、PageRank
TigerGraph
Table 5 图数据库在中心性分析中的应用
应用场景
主要算法
图数据库
分析城市转供电方案[34]
广度优先搜索、深度优先搜索
Neo4j
解决QoS感知的Web 服务组合问题[35]
Dijkstra算法
Neo4j
分析国家交通网络[36]
Yen的K条最短路径算法(KSP)
Neo4j
分析生物医学网络[37]
最短路径算法
OrientDB
Table 6 图数据库在路径查找问题中的应用
应用场景
主要算法
图数据库
结合网络结构相似度预测社会关系[38]
Common Neighbors
Neo4j
基于链路预测的生物医学知识发现[39]
Adamic-Adar
Neo4j
结合深度强化学习构建链路预测模型[40]
Common Neighbors、Adamic-Adar
OrientDB
基于链路预测的作者消歧[41]
Common Neighbors、Adamic-Adar
Neo4j
Table 7 图数据库在链路预测问题中的应用
应用场景
主要算法
图数据库
基于社区检测算法的在线金融欺诈检测[42]
Louvain、BMLPA
Neo4j
基于改进关系权重的科研社区挖掘[43]
Louvain
Neo4j
基于社区检测算法的社交网络分析[44]
Louvain、Edge Betweeness、Walktrap、CNM
Neo4j
基于社区检测算法的道路交通网络分析[45]
Louvain
Neo4j
Table 8 图数据库在社区检测问题中的应用
应用场景
应用方式
图可视化工具
图数据库
知识关联可视化[47]
在线可视化
Echarts.js
Neo4j
面向医学知识图谱的可视化[48]
集成可视化
Neo4j
Neo4j
生物医学数据检索及可视化[49]
在线可视化
Cytoscape.js
Neo4j
Twitter Troll 数据集可视化[50]
集成可视化
Neo4j
Neo4j
Table 9 图数据库在图可视化方面的应用
[1]
Alhussien I, Cambria E, Zhang N S. Semantically Enhanced Models for Commonsense Knowledge Acquisition[C]// Proceedings of the 2018 IEEE International Conference on Data Mining Workshops. IEEE, 2018: 1014-1021.
[2]
Paulheim H. Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods[J]. Semantic Web, 2016, 8: 489-508.
doi: 10.3233/SW-160218
( Shen Zhihong, Zhao Zihao, Wang Haibo. Big Data Technology Stack Shifting: From SQL Centric to Graph Centric[J]. Data Analysis and Knowledge Discovery, 2020, 4(7): 50-65.)
[4]
DBMS Popularity Broken Down by Database Model[EB/OL]. [2021-09-02]. https://db-engines.com/en/ranking_categories.
[5]
Šestak M, Heričko M, Družovec T W, et al. Applying K-Vertex Cardinality Constraints on a Neo4j Graph Database[J]. Future Generation Computer Systems, 2021, 115: 459-474.
doi: 10.1016/j.future.2020.09.036
[6]
Gutfraind A, Genkin M. A Graph Database Framework for Covert Network Analysis: An Application to the Islamic State Network in Europe[J]. Social Networks, 2017, 51: 178-188.
doi: 10.1016/j.socnet.2016.10.004
[7]
Breiger R L, Schoon E, Melamed D, et al. Comparative Configurational Analysis as a Two-Mode Network Problem: A Study of Terrorist Group Engagement in the Drug Trade[J]. Social Networks, 2014, 36: 23-39.
doi: 10.1016/j.socnet.2013.04.002
[8]
Chu Z, Yu J, Hamdulla A. A Novel Deep Learning Method for Query Task Execution Time Prediction in Graph Database[J]. Future Generation Computer Systems, 2020, 112: 534-548.
doi: 10.1016/j.future.2020.06.006
[9]
Gutfraind A, Genkin M. A Graph Database Framework for Covert Network Analysis: An Application to the Islamic State Network in Europe[J]. Social Networks, 2017, 51: 178-188.
doi: 10.1016/j.socnet.2016.10.004
( Qin Yue. Application and Contrast Research of Centrality-Based Algorithms in Complex Network Analysis—Take Text Network as an Example[D]. Tianjin: Tianjin University of Finance & Economics, 2020.)
[25]
Needham M, Hodler A E. Graph Algorithms: Practical Examples in Apache Spark and Neo4j[M]. O’Reilly Media, 2019.
[26]
Lü L Y, Zhou T. Link Prediction in Complex Networks: A Survey[J]. Physica A: Statistical Mechanics and Its Applications, 2011, 390(6): 1150-1170.
doi: 10.1016/j.physa.2010.11.027
( Zhou Xiaonan, Huang Lei, Wang Feiyue, et al. On the Application of Graph Database on Identifying Critical Illness Insurance Group Fraud[J]. Insurance Studies, 2020(9): 92-104.)
[30]
王猛. 面向科技与能力网络的关联分析系统[D]. 大连: 大连理工大学, 2020.
[30]
( Wang Meng. Association Analysis System for Technology and Ability Network[D]. Dalian: Dalian University of Technology, 2020.)
[31]
Kolomeets M, Chechulin A, Kotenko I. Social Networks Analysis by Graph Algorithms on the Example of the VKontakte Social Network[J]. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications, 2019, 10: 55-75.
[32]
Giordani I, Archetti F, Candelieri A, et al. Graph Data Base: An Enabling Technology for Drug Prescription Patterns Analysis[J]. Statistica Applicata-Italian Journal of Applied Statistics, 2020 (2): 181-192.
[33]
Henderson R. Using Graph Databases to Detect Financial Fraud[J]. Computer Fraud & Security, 2020, 2020(7): 6-10.
( Wang Yiwei, Hu Linlin, Luo Cheng, et al. Generation of Urban High-Voltage Power Grid Transfer Power Supply Scheme Based on Integrated Graph Database Algorithm and Pattern Matching[J]. Electrical Measurement & Instrumentation, 2022, 59(4): 169-177.)
[35]
范国栋. 模糊决策与图数据库在服务组合中的研究与应用[D]. 淄博: 山东理工大学, 2020.
[35]
( Fan Guodong. Research and Application of Fuzzy Decision and Graph Database in Service Composition[D]. Zibo: Shandong University of Technology, 2020.)
[36]
Permana S D H, Bintoro K B Y, Arifitama B, et al. Comparative Analysis of Pathfinding Algorithms A*, Dijkstra, and BFS on Maze Runner Game[J]. International Journal of Information Systems & Technology, 2018, 1(2): 1-8.
[37]
Lysenko A, Roznovăţ I A, Saqi M, et al. Representing and Querying Disease Networks Using Graph Databases[J]. BioData Mining, 2016, 9: 23.
doi: 10.1186/s13040-016-0102-8
pmid: 27462371
[38]
郭坤铭. 基于异构网络的关系推理与预测方法研究[D]. 太原: 太原理工大学, 2017.
[38]
( Guo Kunming. Research on Relation Inference and Prediction in Heterogeneous Network[D]. Taiyuan: Taiyuan University of Technology, 2017.)
( Hu Zhengyin, Liu Leilei, Dai Bing, et al. Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph[J]. Data Analysis and Knowledge Discovery, 2020, 4(11): 1-14.)
[40]
Lim M, Abdullah A, Jhanjhi N. Performance Optimization of Criminal Network Hidden Link Prediction Model with Deep Reinforcement Learning[J]. Journal of King Saud University-Computer and Information Sciences, 2021, 33(10): 1202-1210.
doi: 10.1016/j.jksuci.2019.07.010
[41]
Franzoni V, Lepri M, Milani A. Topological and Semantic Graph-Based Author Disambiguation on DBLP Data in Neo4j[OL]. arXiv Preprint, arXiv: 1901.08977.
[42]
施朝浩. 基于图特征的欺诈检测方法研究与应用[D]. 杭州: 浙江大学, 2019.
[42]
( Shi Chaohao. Research and Application of Fraud Detection Method Based on Graph Features[D]. Hangzhou: Zhejiang University, 2019.)
[43]
杜伟静, 李翀, 王宇宸, 等. Web of Science科研社区挖掘算法研究[J]. 小型微型计算机系统, 2020, 41(12): 2465-2469.
[43]
( Du Weijing, Li Chong, Wang Yuchen, et al. Research on Web of Science Academic Community Mining Algorithm[J]. Journal of Chinese Computer Systems, 2020, 41(12): 2465-2469.)
[44]
Drakopoulos G, Gourgaris P, Kanavos A. Graph Communities in Neo4j[J]. Evolving Systems, 2020, 11(3): 397-407.
doi: 10.1007/s12530-018-9244-x
[45]
Rashmi R, Champawat S, Teja G V, et al. Analysis of Road Networks Using the Louvain Community Detection Algorithm[A]//Soft Computing for Problem Solving[M]. Springer, 2020: 749-757.
( Ma Yumeng, Wang Fang, Huang Jinxia, et al. Research on Construction of a Subject Knowledge Base Based on Literature Knowledge Extraction: Using the Knowledge Base of Activating Blood Circulation and Removing Stasis as the Object[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(5): 482-491.)
[48]
马欢欢. 基于电子病历的癫痫医学知识图谱构建的研究[D]. 曲阜: 曲阜师范大学, 2020.
[48]
( Ma Huanhuan. Research on the Construction of Epilepsy Medical Knowledge Graph Based on Electronic Medical Records[D]. Qufu: Qufu Normal University, 2020.)
[49]
Messina A, Fiannaca A la Paglia L, et al. BioGraph: A Web Application and a Graph Database for Querying and Analyzing Bioinformatics Resources[J]. BMC Systems Biology, 2018, 12(S5): 75-89.
doi: 10.1186/s12918-018-0597-3
[50]
Allen D, Hodler A, Hunger M, et al. Understanding Trolls with Efficient Analytics of Large Graphs in Neo4j[C]// Proceedings of the 2019 Datenbanksysteme für Business, Technologie und Web, 2019: 377-396.