Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (4): 59-70    DOI: 10.11925/infotech.2096-3467.2017.1162
Orginal Article Current Issue | Archive | Adv Search |
Ranking Scholarly Impacts Based on Citations and Academic Similarity
Liu Junwan, Yang Bo(), Wang Feifei
School of Economics and Management, Beijing University of Technology, Beijing 100124, China
Download: PDF (3812 KB)   HTML ( 4
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study aims to establish a fair and objective evaluation mechanism for academic impacts, aiming to solve the issues like huge appraisal system, complicated calculation and vague conclusion. [Methods] We proposed a ranking method for each scholar’s impacts based on citation behavior and academic similarity, as well as with the help of Word2Vec, TF-IDF, and PageRank algorithms. [Results] The proposed method combined the influence of a researcher’s scholarly relationship and academic outputs. It has excellent performance in the validity dimension: the relevance of H index and the center of the feature vector with the PR value were 0.872 and 0.617, respectively. The proposed evaluation index could replace the traditional metrics. The average H-index and citation frequency of the scholars within the fixed-ranking interval both increased. The average H-index of the top 100 scholars increased by 1.087 and the average cited frequency increased by 2.080, which were better than the original PageRank algorithm. [Limitations] The efficiency of the proposed algorithm was lower than the PageRank algorithm. [Conclusions] Our new algorithm could be used to analyze academic networks with a large number of nodes. The node’s PR value will be more accurate as the network quality expands. Therefore, the new ranking algorithm could effectively evaluate the academic impacts of many scholars from multi-disciplinary fields, and has better performance than the existing ones.

Key wordsCitation Network      Academic Similarity      Academic Influence      Ranking Method     
Received: 20 November 2017      Published: 11 May 2018
ZTFLH:  G353.1  

Cite this article:

Liu Junwan,Yang Bo,Wang Feifei. Ranking Scholarly Impacts Based on Citations and Academic Similarity. Data Analysis and Knowledge Discovery, 2018, 2(4): 59-70.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.1162     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I4/59

施引学者 被引学者 学术相似度 引用频次
姓名 机构 姓名 机构
Durbin Richard Wellcome Trust Sanger Inst Prokopenko Inga Univ Oxford 0.56292 7
Durbin Richard Wellcome Trust Sanger Inst Muzny Donna Baylor Coll Med 0.85074 1
Durbin Richard Wellcome Trust Sanger Inst Raitakari Olli Univ Turku 0.58119 3
Durbin Richard Wellcome Trust Sanger Inst Durbin Richard Wellcome Trust Sanger Inst None 34
Durbin Richard Wellcome Trust Sanger Inst Biesecker Leslie NHGRI 0.61436 1
排名 姓名 PR 排名 姓名 PR
1 boerwinkle, eric 0.004715 11 eriksson, johan g 0.003537
2 de jager, philip l. 0.004254 12 ophoff, roel a 0.003181
3 meitinger, thomas 0.004173 13 raitakari, olli t 0.003118
4 hirschhorn, joel n. 0.003937 14 hakonarson, hakon 0.002978
5 aung, tin 0.003816 15 montgomery, grant w 0.002938
6 alkuraya, fowzan s. 0.003772 16 daly, mark j 0.002913
7 shin, hyoung doo 0.003658 17 munnich, arnold 0.002875
8 majewski, jacek 0.003624 18 de bakker, paul i. w 0.002837
9 robert, catherine 0.003564 19 martin, nicholas g 0.002638
10 palotie, aarno 0.003561 20 illig, thomas 0.002637
数据 操作 时间
数量 单位
训练集数据 数据预处理 3.74 小时
Word2Vec模型训练 7.46 小时
测试集数据 数据预处理 27.13 分钟
TF-IDF运算 2.52 分钟
Auth2Vec学术相似度计算 4.12 分钟
引文网络构建 12.79 分钟
PageRank排名 4.42 分钟
PR值 H指数 特征向量中心度
PR值 Pearson相关系数 1 .617** .872**
显著性(双尾) 0 0 0
姓名 论文数量 总被引频次 平均被引频次 最高单篇被引频次 NatureScience论文
Boerwinkle, Eric 240 14 722 61.34 1 441 15
de Jager, Philip l 97 5 143 53.02 820 11
Meitinger, Thomas 173 13 386 77.38 1 441 11
Hirschhorn, Joel N 146 9 428 64.58 1 441 13
Aung, Tin 53 3 168 59.77 340 1
[1] Editorial. Pros and Cons of Open Peer Review[J]. Nature Neuroscience, 1999, 2(3): 197-198.
doi: 10.1038/6295 pmid: 10195206
[2] Hirsch J E.An Index to Quantify an Individual’s Scientific Research Output[J]. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(46): 16569-16572.
doi: 10.1073/pnas.0507655102
[3] Alberts B.Impact Factor Distortions[J]. Science, 2013, 340(6134): 787.
doi: 10.1126/science.1240319
[4] 刘璇, 段宇锋, 朱庆华. 基于合著网络的学术人才评价方法研究[J]. 情报杂志, 2014, 33(12): 77-82.
[4] (Liu Xuan, Duan Yufeng, Zhu Qinghua.Study on Evaluation Methods of Academic Talents Based on Co-author Network[J]. Journal of Information, 2014, 33(12): 77-82.)
[5] 王彦雨, 池田. 科学文本研究的神化范式及其转变[J]. 科学学研究, 2009, 27(3): 328-333.
[5] (Wang Yanyu, Chi Tian.The Deification Paradigm of Scientific Text Research and Its Transformation[J]. Science of Science Research, 2009, 27(3): 328-333.)
[6] Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking: Bringing Order to the Web[R/OL]. Stanford InfoLab, 1999. .
[7] 李仲谋. ScholarRank: 一种新的评价学术论文影响力的方法[J]. 情报理论与实践, 2014, 37(7): 102-105.
[7] (Li Zhongmou.ScholarRank: A New Method for Evaluating the Influence of Academic Papers[J]. Information Studies: Theory and Practice, 2014, 37(7): 102-105.)
[8] Brin S, Page L.The Anatomy of a Large-scale Hypertextual Web Search Engine[J]. Computer Networks and ISDN Systems, 1998, 30(1-7): 107-117.
doi: 10.1016/S0169-7552(98)00110-X
[9] Wallach H M.Topic Modeling: Beyond Bag-of-Words[C]// Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 977-984.
[10] Uijlings J R R, Smeulders A W M, Scha R J H. Real-time Bag of Words, Approximately[C]// Proceedings of the ACM International Conference on Image and Video Retrieval. ACM, 2009.
[11] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[C]// Advances in Neural Information Processing Systems 26. 2013: 3111-3119.
[12] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.1781v3.
[13] Salton G, Yu C T.On the Construction of Effective Vocabularies for Information Retrieval[C]// Proceedings of the 1973 Meeting on Programming Languages and Information Retrieval. 1973: 48-60.
[14] 吴军. 数学之美[M]. 北京: 人民邮电出版社, 2012: 109-111.
[14] (Wu Jun.Mathematical Beauty [M]. Beijing: Posts & Telecom Press, 2012: 109-111.)
[15] 付媛, 朱礼军, 韩红旗. 姓名消歧方法研究进展[J]. 情报工程, 2016, 2(1): 53-58.
doi: 10.3772/j.issn.2095-915x.2016.01.007
[15] (Fu Yuan, Zhu Lijun, Han Hongqi.Research Progress of the Method of Name Disambiguation[J]. Intelligence Engineering, 2016, 2(1): 53-58.)
doi: 10.3772/j.issn.2095-915x.2016.01.007
[16] 任景华. 利用优化的DBSCAN算法进行文献著者人名消歧[J]. 图书馆理论与实践, 2014(12): 61-65.
[16] (Ren Jinghua.Using the Optimized DBSCAN Algorithm for Disambiguation of the Names of the Authors[J]. Library Theory and Practice, 2014(12): 61-65.)
[17] Larsen B, Aone C.Fast and Effective Text Mining Using Linear-time Document Clustering[C]// Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1999: 16-22.
[18] Wang X, McCallum A. Topics Over Time: A Non-Markov Continuous-Time Model of Topical Trends[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2006: 424-433.
[19] Huang A.Similarity Measures for Text Document Clustering[C]//Proceedings of the 14th Annual New Zealand Computer Science Research Student Conference, New Zealand. 2008: 49-56.
[20] Zhao D, Strotmann A. Counting First, Last, or All Authors in Citation Analysis: A Comprehensive Comparison in the Highly Collaborative Stem Cell Research Field[J]. Journal of the American Society for Information Science & Technology, 2011, 62(4): 654-676.
doi: 10.1002/asi.21495
[21] Persson O.All Author Citations Versus First Author Citations[J]. Scientometrics, 2001, 50(2): 339-344.
doi: 10.1023/A:1010534009428
[22] 周金梦. 基于学术异构网络的学者影响力评估算法[D]. 大连: 大连理工大学, 2016.
[22] (Zhou Jinmeng.Scholar’s Influence Evaluation Algorithm Based on Academic Heterogeneous Network[D]. Dalian: Dalian University of Technology, 2016.)
[23] 孟德尔. 植物杂交的试验[M]. 北京: 科学出版社, 1958.
[23] (Mendel G J.Plant Hybridization Test[M]. Beijing: Science Press, 1958.)
[24] 冯永康, 田洺, 杨海燕, 等. 当代中国遗传学家学术谱系[M]. 上海: 上海交通大学出版社, 2016: 45-46.
[24] (Feng Yongkang, Tian Ming, Yang Haiyan, et al.Contemporary Chinese Genetics Academic Pedigree [M]. Shanghai: Shanghai Jiaotong University Press, 2016: 45-46.)
[25] 阚连合, 黄晓鹂, 刘梅申. 情报学交叉学科的发展趋势——我国情报学期刊被引分析的启示[J]. 现代情报, 2007, 27(1): 62-64.
doi: 10.3969/j.issn.1008-0821.2007.01.022
[25] (Kan Lianhe, Huang Xiaoli, Liu Meishen.The Development Trend of Interdisciplinary Information Science - The Enlightenment of Citation Analysis of China’s Information Science Journals[J]. Journal of Modern Informaiton, 2007, 27(1): 62-64.)
doi: 10.3969/j.issn.1008-0821.2007.01.022
[26] 苑彬成, 方曙, 刘清, 等. 国内外引文分析研究进展综述[J]. 情报科学, 2010, 28(1): 147-153.
[26] (Yuan Bincheng, Fang Shu, Liu Qing, et al.Citation Analysis of Research Progress at Home and Abroad[J]. Information Science, 2010, 28(1): 147-153.)
[27] Newman M.Networks: An Introduction [M]. Oxford University Press, 2010: 741-743.
[1] Chen Yunwei,Zhang Ruihong. Comparing on Community Detection Algorithms for Information Mining[J]. 数据分析与知识发现, 2018, 2(10): 84-94.
[2] Qin Xiaohui, Le Xiaoqiu. Topic Sources and Trends Tracking Towards Citation Network of Single Paper[J]. 现代图书情报技术, 2015, 31(9): 52-59.
[3] Ku Liping. Reviews of the Open Data Metric Studies:An Alternative Metric (Altmetrics) for Calculating the Online User Behavior and the Scientific Community Impact[J]. 现代图书情报技术, 2013, (6): 1-8.
[4] Xiao Yufeng, Jiang Hong, Dong Ke. A Study on Mediation Roles to Patent Assignee Citation Network[J]. 现代图书情报技术, 2011, (11): 60-66.
[5] Chen Yijia,Zhao Xing. Study on knowledge Communication of International Information &|Library Science: A Perspective of Journal Citation Networks[J]. 现代图书情报技术, 2009, 25(6): 55-60.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn