Influence of Network Structure Changes on Co-word Network Link Prediction
Chen Zhuo1,Jiang Xixi1,Zhang Xiaojuan2()
1School of Computer and Information Science, Southwest University, Chongqing 400715, China 2School of Public Administration, Sichuan University, Chengdu 610065, China
[Objective] This article studies the impacts of co-word network structure changes on link prediction using the similarity metric.[Methods] Firstly, we randomly retrieved the ISLS, LAW, BSS, COM, and Ocean literature from the core collection of Web of Science (2015 to 2020). Secondly, according to the diverse keyword frequencies, we constructed co-word networks with various topological features, such as the number of nodes and edges, the Average Clustering Coefficient, the Density, the Network Transitivity, and the Average Degree. Finally, we chose 15 traditional link prediction similarity metrics(e.g., AA, CN, RWR, and Katz) to conduct link prediction experiments on various co-word networks. [Results] We compared and analyzed the prediction effects of different similarity metrics with the network structure change. (1) In different disciplines, in most cases, the larger the overall frequency of keywords in the co-word network, the smaller the average clustering coefficient, the larger the density, network transitivity, average degree, average degree centrality, average betweenness centrality and average closeness, and the greater the possibility of poor link prediction effect. Conversely, the larger the average clustering coefficient, the smaller the other network topologies, and the better the link prediction effect. (2) Among the 15 selected similarity indicators, the RWR metric performed the best in co-word networks with different topological characteristics. The prediction performance of the Katz metrics is the most stable in different co-word networks. The prediction results of each index in the LAW discipline are most affected by the change in keyword frequency. [Limitations] Due to limited computing space, we only used one classification method and one evaluation index in this study. In addition, we did not explore some node similarity indicators (i.e., likelihood analysis-based metrics and probability model-based metrics). [Conclusions] This study provides a theoretical foundation for selecting similarity metrics of co-word networks of different disciplines.
Callon M, Courtial J P, Turner W A, et al. From Translations to Problematic Networks: An Introduction to Co-word Analysis[J]. Social Science Information, 1983, 22(2): 191-235.
doi: 10.1177/053901883022002003
(Wang Yandong, Li Mengmeng, Fu Xiaokang, et al. A New Method to Detect the Development Situation of Disasters Based on Social Media Co-word Network[J]. Geomatics and Information Science of Wuhan University, 2020, 45(5): 691-698, 735.)
(Zhang Bin, Li Yating, Dai Yiqing. Research on the Influence of Clustering Coefficient on the Link Prediction in Collaboration Networks[J]. Information Studies: Theory & Application, 2018, 41(1): 100-104, 99.)
[5]
Breiman L. Random Forests[J]. Machine Learning, 2001, 45(1): 5-32.
doi: 10.1023/A:1010933404324
[6]
Benchettara N, Kanawati R, Rouveirol C. Supervised Machine Learning Applied to Link Prediction in Bipartite Social Networks[C]// Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining. 2010: 326-330.
(Ding Jingda, Guo Jie. Mining Potential Author Cooperative Relationships Based on the Similarity of Content and Path[J]. Information Studies: Theory & Application, 2021, 44(1): 124-128, 123.)
(Wu Shengnan, Pu Hongjun, Tian Ruonan, et al. Network Structure's Impacts on Link Prediction Algorithm from Meta-Analysis Perspective[J]. Data Analysis and Knowledge Discovery, 2021, 5(11): 102-113.)
[9]
Freeman L C. A Set of Measures of Centrality Based on Betweenness[J]. Sociometry, 1977, 40(1): 35-41.
doi: 10.2307/3033543
[10]
Adamic L A, Adar E. Friends and Neighbors on the Web[J]. Social Networks, 2003, 25(3): 211-230.
doi: 10.1016/S0378-8733(03)00009-1
[11]
Agrawal R, Imieliński T, Swami A. Mining Association Rules Between Sets of Items in Large Databases[C]// Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. 1993: 207-216.
[12]
Kleinberg J M. Navigation in A Small World[J]. Nature, 2000, 406(6798): 845.
doi: 10.1038/35022643
[13]
Katz L. A New Status Index Derived from Sociometric Analysis[J]. Psychometrika, 1953, 18(1): 39-43.
doi: 10.1007/BF02289026
[14]
Fouss F, Pirotte A, Renders J M, et al. Random-Walk Computation of Similarities Between Nodes of a Graph with Application to Collaborative Recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 355-369.
doi: 10.1109/TKDE.2007.46
[15]
Brin S, Page L. The Anatomy of a Large-Scale Hypertextual Web Search Engine[J]. Computer Networks and ISDN Systems, 1998, 30(1-7): 107-117.
doi: 10.1016/S0169-7552(98)00110-X
(Zhang Min, Zhu Mingxing, Liu Xiaotong. The Diffusion and Evolution of Cloud Computing Research Based on Keywords Network Mining and Time Sequence Analysis[J]. Library Work and Study, 2016(12): 61-68.)
(Liu Ziqiang, Yue Lixin, Xu Haiyun, et al. Construction of a Temporal Co-word Network and Its Dynamic Visualization[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(2): 186-198.)
(Wang Xiaoguang, Cheng Qikai. Analysis on Evolution of Research Topics in a Discipline Based on NEViewer[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(9): 900-911.)
[19]
Salton G, McGill M J. Introduction to Modern Information Retrieval[M]. New York: McGraw-Hill, 1983.
[20]
Jaccard P. Étude Comparative de la Distribution Florale Dans Une Portion des Alpes et du Jura[J]. Bulletin de la Societe Vaudoise des Sciences Naturelles, 1901, 37: 547-579.
[21]
Sorensen T. A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content and Its Application to Analyses of the Vegetation on Danish Commons[M]. København: I kommission hos E. Munksgaard, 1948.
[22]
Ravasz E, Somera A L, Mongru D A, et al. Hierarchical Organization of Modularity in Metabolic Networks[J]. Science, 2002, 297(5586): 1551-1555.
doi: 10.1126/science.1073374
pmid: 12202830
[23]
Leicht E A, Holme P, Newman M E J. Vertex Similarity in Networks[J]. Physical Review E, Statistical, Nonlinear, and Soft Matter Physics, 2006, 73(2 Pt 2): 026120.
(Gong Xue, Cui Lei. Link Prediction in MeSH Terms Co-occurring Networks[J]. Journal of Intelligence, 2018, 37(1): 66-71, 52.)
[25]
Zhou T, Lv L Y, Zhang Y C. Predicting Missing Links via Local Information[J]. The European Physical Journal B, 2009, 71(4): 623-630.
doi: 10.1140/epjb/e2009-00335-8
(Zou Lie, Zhang Yuexia. A Psor Link Prediction Algorithm Based on Complex Network[J]. Telecommunication Engineering, 2021, 61(12): 1579-1585.)
[27]
Leskovec J, Adamic L A, Huberman B A. The Dynamics of Viral Marketing[J]. ACM Transactions on the Web (TWEB), 2007, 1(1):5-es.
[28]
Barabasi A L, Albert R. Emergence of Scaling in Random Networks[J]. Science, 1999, 286(5439): 509-512.
doi: 10.1126/science.286.5439.509
pmid: 10521342
(Liu Si, Liu Hai, Chen Qimai, et al. Link Prediction Algorithm Based on Network Representation Learning and Random Walk[J]. Journal of Computer Applications, 2017, 37(8): 2234-2239.)
doi: 10.11772/j.issn.1001-9081.2017.08.2234
(Cao Zhiwei, Fan Zhijie, Wang Qingyang, et al. Link Prediction Algorithm Based on Denoising Autoencoder in Complex Networks[J]. Journal of Chinese Computer Systems, 2023, 44(3): 665-672.)
[32]
吕琳媛. 复杂网络链路预测[J]. 电子科技大学学报, 2010, 39(5): 651-661.
[32]
(Lv Linyuan. Link Prediction on Complex Networks[J]. Journal of University of Electronic Science and Technology of China, 2010, 39(5): 651-661.)
[33]
Xiong T, Zhou L, Zhao Y, et al. Mining Semantic Information of Co-word Network to Improve Link Prediction Performance[J]. Scientometrics, 2022, 127(6): 2981-3004.
doi: 10.1007/s11192-021-04247-9
[34]
Zachary W W. An Information Flow Model for Conflict and Fission in Small Groups[J]. Journal of Anthropological Research, 1977, 33(4): 452-473.
doi: 10.1086/jar.33.4.3629752
[35]
Lusseau D, Schneider K, Boisseau O J, et al. The Bottlenose Dolphin Community of Doubtful Sound Features A Large Proportion of Long-Lasting Associations[J]. Behavioral Ecology and Sociobiology, 2003, 54 (4): 396-405.
doi: 10.1007/s00265-003-0651-y
[36]
Guimerà R, Mossa S, Turtschi A, et al. The Worldwide Air Transportation Network: Anomalous Centrality, Community Structure, and Cities' Global Roles[J]. PNAS, 2005, 102(22): 7794-7799.
pmid: 15911778
[37]
White J G, Southgate E, Thomson J N, et al. The Structure of the Nervous System of the Nematode Caenorhabditis Elegans[J]. Philosophical Transactions of the Royal Society of London Series B, Biological Sciences, 1986, 314(1165): 1-340.
[38]
Reed J L, Vo T D, Schilling C H, et al. An Expanded Genome-scale Model of Escherichia Coli K-12 (iJR904 GSM/GPR)[J]. Genome Biology, 2003, 4 (9): R54.
doi: 10.1186/gb-2003-4-9-r54
pmid: 12952533
[39]
Guimerà R, Sales-Pardo M. Missing and Spurious Interactions and the Reconstruction of Complex Networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2009, 106(52): 22073-22078.
doi: 10.1073/pnas.0908366106
pmid: 20018705
[40]
Clauset A, Moore C, Newman M E J. Hierarchical Structure and the Prediction of Missing Links in Networks[J]. Nature, 2008, 453(7191): 98-101.
doi: 10.1038/nature06830
[41]
Li H J, An H Z, Wang Y, et al. Evolutionary Features of Academic Articles Co-keyword Network and Keywords Co-occurrence Network: Based on Two-Mode Affiliation Network[J]. Physica A: Statistical Mechanics and Its Applications, 2016, 450: 657-669.
doi: 10.1016/j.physa.2016.01.017
[42]
Liben-Nowell D, Kleinberg J. The Link Prediction Problem for Social Networks[C]// Proceedings of the 12th International Conference on Information and Knowledge Management. 2003: 556-559.
[43]
Sun J C, Feng L, Xie J R, et al. Revealing the Predictability of Intrinsic Structure in Complex Networks[J]. Nature Communications, 2020, 11: 574.
doi: 10.1038/s41467-020-14418-6
pmid: 31996676
(Yue Zenghui, Xu Haiyun, Wang Qianfei. Dynamic Link Prediction of Knowledge Diffusion in Disciplinary Citation Networks Based on Local Information[J]. Information Studies: Theory & Application, 2020, 43(2): 84-91, 99.)
[46]
Euler L. The Solution of a Problem Relating to the Geometry of Position[J]. Commentarii Academiae Scientiarum Petropolitanae, 1741, 8: 128-140.
[47]
Euler L. The Seven Bridges of Königsberg[J]. The World of Mathematics, 1956, 1: 573-580.
[48]
Watts D J, Strogatz S H. Collective Dynamics of ‘Small-World’ Networks.[J]. Nature, 1998, 393(6684): 440-442.
doi: 10.1038/30918
[49]
Moreno J L. Who Shall Survive? A New Approach to the Problem of Human Interrelations[J]. The Journal of Nervous & Mental Disease, 1934, 80(6): 724-725.
[50]
Erdős P, Rényi A. On Random Graphs I[J]. Publicationes Mathematicae, 1959, 6: 290-297.
[51]
Freeman L C. Centrality in Social Networks: Conceptual Clarification[J]. Social Networks, 1978-1979, 1(3): 215-239.
doi: 10.1016/0378-8733(78)90021-7
[52]
Bavelas A. Communication Patterns in Task-Oriented Groups[J]. The Journal of the Acoustical Society of America, 1950, 22(6): 725-730.
doi: 10.1121/1.1906679
(Liu Minjuan, Zhang Xuefu, Yan Yun. Research on Method of Determining Scope of Word Set in Co-word Analysis Based on Word Frequency, Number of Words, Cumulative Word Frequency in Proportion[J]. Library and Information Service, 2016, 60(23): 135-142.)
doi: 10.13266/j.issn.0252-3116.2016.23.017
[54]
Hardy M. Pareto's Law[J]. The Mathematical Intelligencer, 2010, 32(3): 38-43.
doi: 10.1007/s00283-010-9159-2
(Sheng Jiliang, Huang Yi, Li Juchao. Research on the Correlation Between Industry Risk and Industry Network Structure in China[J/OL]. Chinese Journal of Management Science:: 1-15 [2022-09-12]. DOI:10.16381/j.cnki.issn1003-207x.2021.2369.)
[56]
Lemaître G, Nogueira F, Aridas C K. Imbalanced-Learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning[OL]. arXiv Preprint, arXiv: 1609.06570.
[57]
Khoshgoftaar T M, Golawala M, Van Hulse J. An Empirical Study of Learning from Imbalanced Data Using Random Forest[C]// Proceedings of 19th IEEE International Conference on Tools with Artificial Intelligence. 2007: 310-317.
[58]
Sun Y Z, Han J W. Mining Heterogeneous Information Networks: Principles and Methodologies[J]. Synthesis Lectures on Data Mining and Knowledge Discovery, 2012, 3(2):1-159.
[59]
Abro M, Nawaz H, Abro W A. Performance Analysis of Dissimilar Classification Methods using RapidMiner[J]. Sindh University Research Journal, 2016, 48(1): 185-188.
[60]
Ristoski P, Bizer C, Paulheim H. Mining the Web of Linked Data with RapidMiner[J]. Journal of Web Semantics, 2015, 35(P3): 142-151.
doi: 10.1016/j.websem.2015.06.004
(Huang Danyang, Zhang Liwen. Link Prediction of Bipartite Signed Network Based on Structural Balance in Local Communities[J]. Statistical Research, 2021, 38(12): 131-144.)
(Wu Yiteng, Yu Hongtao, Gu Zeyu. Link Prediction Method Based on Network Structure Model for Unified Description[J]. Computer Engineering, 2022, 48(7): 51-58.)
doi: 10.19678/j.issn.1000-3428.0061523
[63]
Hanley J A, McNeil B J. The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve[J]. Radiology, 1982, 143(1): 29-36.
doi: 10.1148/radiology.143.1.7063747
pmid: 7063747
[64]
Chauhan R, Kaur H. Predictive Analytics and Data Mining: A Framework for Optimizing Decisions with R Tool[A]// Advances in Secure Computing, Internet Services, and Applications[M]. DOI: 10.4018/978-1-4666-4940-8.ch004.
(Wan Yangye, Guo Jinli. Link Prediction Algorithm Based on Resource Allocation and Graph Embedding Weighting[J]. Computer and Modernization, 2021(7): 12-17.)
Wu Chencheng, Zhou Yinzuo. Link Prediction Research on Temporal Network Based on Graph Embedding Method[J]. Journal of Hangzhou Normal University (Natural Science Edition), 2020, 19(5): 472-480.)