Please wait a minute...
Advanced Search
数据分析与知识发现  2018, Vol. 2 Issue (4): 59-70     https://doi.org/10.11925/infotech.2096-3467.2017.1162
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于引证行为与学术相似度的学者影响力领域排名方法研究*
刘俊婉, 杨波(), 王菲菲
北京工业大学经济与管理学院 北京 100124
Ranking Scholarly Impacts Based on Citations and Academic Similarity
Liu Junwan, Yang Bo(), Wang Feifei
School of Economics and Management, Beijing University of Technology, Beijing 100124, China
全文: PDF (3812 KB)   HTML ( 3
输出: BibTeX | EndNote (RIS)      
摘要 

目的】针对多样化评价指标导致评价体系庞大、计算繁琐、结论模糊等问题, 研究一套公正、有效、快速的学术影响力排名机制。【方法】结合Word2Vec算法、TF-IDF算法和PageRank算法, 提出一种基于引证行为与学术相似度的学者影响力领域排名方法。【结果】改进后的排序算法综合了学者学术关系层面与学者学术产出层面的学术影响力, 在有效性维度表现优异: PR值与特征向量中心度、H指数的相关性分别为0.872、0.617, 对传统评价指标具有优秀的替代作用; 同时, 在固定排名区间内学者的平均H指数与平均被引频次均有所提高, 前百名学者的平均H指数提高1.087, 平均被引频次提高2.080, 排名效果优于原始PageRank算法。【局限】算法时间复杂度与空间复杂度虽然在可接受范围之内, 但相对原始PageRank算法效率有所降低。【结论】改进算法适用于具有大量节点的学者学术网络, 节点PR值随着网络质量扩大而更趋于准确, 因此在多学科、大量学者等场景下的学术影响力评价中, 改进排名算法对原有评价指标具有一定的替代性, 且效果表现较改进前表现优异。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
刘俊婉
杨波
王菲菲
关键词 引文网络学术相似度学术影响力排名方法    
Abstract

[Objective] This study aims to establish a fair and objective evaluation mechanism for academic impacts, aiming to solve the issues like huge appraisal system, complicated calculation and vague conclusion. [Methods] We proposed a ranking method for each scholar’s impacts based on citation behavior and academic similarity, as well as with the help of Word2Vec, TF-IDF, and PageRank algorithms. [Results] The proposed method combined the influence of a researcher’s scholarly relationship and academic outputs. It has excellent performance in the validity dimension: the relevance of H index and the center of the feature vector with the PR value were 0.872 and 0.617, respectively. The proposed evaluation index could replace the traditional metrics. The average H-index and citation frequency of the scholars within the fixed-ranking interval both increased. The average H-index of the top 100 scholars increased by 1.087 and the average cited frequency increased by 2.080, which were better than the original PageRank algorithm. [Limitations] The efficiency of the proposed algorithm was lower than the PageRank algorithm. [Conclusions] Our new algorithm could be used to analyze academic networks with a large number of nodes. The node’s PR value will be more accurate as the network quality expands. Therefore, the new ranking algorithm could effectively evaluate the academic impacts of many scholars from multi-disciplinary fields, and has better performance than the existing ones.

Key wordsCitation Network    Academic Similarity    Academic Influence    Ranking Method
收稿日期: 2017-11-20      出版日期: 2018-05-11
ZTFLH:  G353.1  
基金资助:*本文系国家自然科学基金青年项目“共生视角下的院士科学合作网络结构与演化趋势研究: 以中美两国科学院院士为例”(项目编号: 71603015)、国家社会科学基金青年项目“基于多维信息计量分析的学术影响力综合评价研究”(项目编号: 15CTQ023)和北京市自然科学基金项目“基于技术共生网络结构探测和演化的新兴趋势识别研究”(项目编号: 9182001)的研究成果之一
引用本文:   
刘俊婉, 杨波, 王菲菲. 基于引证行为与学术相似度的学者影响力领域排名方法研究*[J]. 数据分析与知识发现, 2018, 2(4): 59-70.
Liu Junwan,Yang Bo,Wang Feifei. Ranking Scholarly Impacts Based on Citations and Academic Similarity. Data Analysis and Knowledge Discovery, 2018, 2(4): 59-70.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.1162      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2018/V2/I4/59
  CBOW模型示意图
  领域排名方法技术路线图
  数据采集与预处理流程图
  Word2Vec训练集输入样例
  指标计算与网络构建流程
  发文量前5 000作者的热点词汇分布
施引学者 被引学者 学术相似度 引用频次
姓名 机构 姓名 机构
Durbin Richard Wellcome Trust Sanger Inst Prokopenko Inga Univ Oxford 0.56292 7
Durbin Richard Wellcome Trust Sanger Inst Muzny Donna Baylor Coll Med 0.85074 1
Durbin Richard Wellcome Trust Sanger Inst Raitakari Olli Univ Turku 0.58119 3
Durbin Richard Wellcome Trust Sanger Inst Durbin Richard Wellcome Trust Sanger Inst None 34
Durbin Richard Wellcome Trust Sanger Inst Biesecker Leslie NHGRI 0.61436 1
  遗传学领域学者间学术相似度样例表
  发文量前5 000学者全作者网络
排名 姓名 PR 排名 姓名 PR
1 boerwinkle, eric 0.004715 11 eriksson, johan g 0.003537
2 de jager, philip l. 0.004254 12 ophoff, roel a 0.003181
3 meitinger, thomas 0.004173 13 raitakari, olli t 0.003118
4 hirschhorn, joel n. 0.003937 14 hakonarson, hakon 0.002978
5 aung, tin 0.003816 15 montgomery, grant w 0.002938
6 alkuraya, fowzan s. 0.003772 16 daly, mark j 0.002913
7 shin, hyoung doo 0.003658 17 munnich, arnold 0.002875
8 majewski, jacek 0.003624 18 de bakker, paul i. w 0.002837
9 robert, catherine 0.003564 19 martin, nicholas g 0.002638
10 palotie, aarno 0.003561 20 illig, thomas 0.002637
  遗传学领域学者影响力前20排名表
数据 操作 时间
数量 单位
训练集数据 数据预处理 3.74 小时
Word2Vec模型训练 7.46 小时
测试集数据 数据预处理 27.13 分钟
TF-IDF运算 2.52 分钟
Auth2Vec学术相似度计算 4.12 分钟
引文网络构建 12.79 分钟
PageRank排名 4.42 分钟
  领域排名方法各操作步骤消耗时间量统计表
  特征向量中心度与PR值的散点分布
PR值 H指数 特征向量中心度
PR值 Pearson相关系数 1 .617** .872**
显著性(双尾) 0 0 0
  各指标相关性分析表
  H指数与PR值的散点分布
姓名 论文数量 总被引频次 平均被引频次 最高单篇被引频次 NatureScience论文
Boerwinkle, Eric 240 14 722 61.34 1 441 15
de Jager, Philip l 97 5 143 53.02 820 11
Meitinger, Thomas 173 13 386 77.38 1 441 11
Hirschhorn, Joel N 146 9 428 64.58 1 441 13
Aung, Tin 53 3 168 59.77 340 1
  遗传学领域排名前5的研究学者发文情况统计表
  改进后排序算法与原PageRank算法各排名区间指标差值分布
[1] Editorial. Pros and Cons of Open Peer Review[J]. Nature Neuroscience, 1999, 2(3): 197-198.
doi: 10.1038/6295 pmid: 10195206
[2] Hirsch J E.An Index to Quantify an Individual’s Scientific Research Output[J]. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(46): 16569-16572.
doi: 10.1073/pnas.0507655102
[3] Alberts B.Impact Factor Distortions[J]. Science, 2013, 340(6134): 787.
doi: 10.1126/science.1240319
[4] 刘璇, 段宇锋, 朱庆华. 基于合著网络的学术人才评价方法研究[J]. 情报杂志, 2014, 33(12): 77-82.
[4] (Liu Xuan, Duan Yufeng, Zhu Qinghua.Study on Evaluation Methods of Academic Talents Based on Co-author Network[J]. Journal of Information, 2014, 33(12): 77-82.)
[5] 王彦雨, 池田. 科学文本研究的神化范式及其转变[J]. 科学学研究, 2009, 27(3): 328-333.
[5] (Wang Yanyu, Chi Tian.The Deification Paradigm of Scientific Text Research and Its Transformation[J]. Science of Science Research, 2009, 27(3): 328-333.)
[6] Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking: Bringing Order to the Web[R/OL]. Stanford InfoLab, 1999. .
[7] 李仲谋. ScholarRank: 一种新的评价学术论文影响力的方法[J]. 情报理论与实践, 2014, 37(7): 102-105.
[7] (Li Zhongmou.ScholarRank: A New Method for Evaluating the Influence of Academic Papers[J]. Information Studies: Theory and Practice, 2014, 37(7): 102-105.)
[8] Brin S, Page L.The Anatomy of a Large-scale Hypertextual Web Search Engine[J]. Computer Networks and ISDN Systems, 1998, 30(1-7): 107-117.
doi: 10.1016/S0169-7552(98)00110-X
[9] Wallach H M.Topic Modeling: Beyond Bag-of-Words[C]// Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 977-984.
[10] Uijlings J R R, Smeulders A W M, Scha R J H. Real-time Bag of Words, Approximately[C]// Proceedings of the ACM International Conference on Image and Video Retrieval. ACM, 2009.
[11] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[C]// Advances in Neural Information Processing Systems 26. 2013: 3111-3119.
[12] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.1781v3.
[13] Salton G, Yu C T.On the Construction of Effective Vocabularies for Information Retrieval[C]// Proceedings of the 1973 Meeting on Programming Languages and Information Retrieval. 1973: 48-60.
[14] 吴军. 数学之美[M]. 北京: 人民邮电出版社, 2012: 109-111.
[14] (Wu Jun.Mathematical Beauty [M]. Beijing: Posts & Telecom Press, 2012: 109-111.)
[15] 付媛, 朱礼军, 韩红旗. 姓名消歧方法研究进展[J]. 情报工程, 2016, 2(1): 53-58.
doi: 10.3772/j.issn.2095-915x.2016.01.007
[15] (Fu Yuan, Zhu Lijun, Han Hongqi.Research Progress of the Method of Name Disambiguation[J]. Intelligence Engineering, 2016, 2(1): 53-58.)
doi: 10.3772/j.issn.2095-915x.2016.01.007
[16] 任景华. 利用优化的DBSCAN算法进行文献著者人名消歧[J]. 图书馆理论与实践, 2014(12): 61-65.
[16] (Ren Jinghua.Using the Optimized DBSCAN Algorithm for Disambiguation of the Names of the Authors[J]. Library Theory and Practice, 2014(12): 61-65.)
[17] Larsen B, Aone C.Fast and Effective Text Mining Using Linear-time Document Clustering[C]// Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1999: 16-22.
[18] Wang X, McCallum A. Topics Over Time: A Non-Markov Continuous-Time Model of Topical Trends[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2006: 424-433.
[19] Huang A.Similarity Measures for Text Document Clustering[C]//Proceedings of the 14th Annual New Zealand Computer Science Research Student Conference, New Zealand. 2008: 49-56.
[20] Zhao D, Strotmann A. Counting First, Last, or All Authors in Citation Analysis: A Comprehensive Comparison in the Highly Collaborative Stem Cell Research Field[J]. Journal of the American Society for Information Science & Technology, 2011, 62(4): 654-676.
doi: 10.1002/asi.21495
[21] Persson O.All Author Citations Versus First Author Citations[J]. Scientometrics, 2001, 50(2): 339-344.
doi: 10.1023/A:1010534009428
[22] 周金梦. 基于学术异构网络的学者影响力评估算法[D]. 大连: 大连理工大学, 2016.
[22] (Zhou Jinmeng.Scholar’s Influence Evaluation Algorithm Based on Academic Heterogeneous Network[D]. Dalian: Dalian University of Technology, 2016.)
[23] 孟德尔. 植物杂交的试验[M]. 北京: 科学出版社, 1958.
[23] (Mendel G J.Plant Hybridization Test[M]. Beijing: Science Press, 1958.)
[24] 冯永康, 田洺, 杨海燕, 等. 当代中国遗传学家学术谱系[M]. 上海: 上海交通大学出版社, 2016: 45-46.
[24] (Feng Yongkang, Tian Ming, Yang Haiyan, et al.Contemporary Chinese Genetics Academic Pedigree [M]. Shanghai: Shanghai Jiaotong University Press, 2016: 45-46.)
[25] 阚连合, 黄晓鹂, 刘梅申. 情报学交叉学科的发展趋势——我国情报学期刊被引分析的启示[J]. 现代情报, 2007, 27(1): 62-64.
doi: 10.3969/j.issn.1008-0821.2007.01.022
[25] (Kan Lianhe, Huang Xiaoli, Liu Meishen.The Development Trend of Interdisciplinary Information Science - The Enlightenment of Citation Analysis of China’s Information Science Journals[J]. Journal of Modern Informaiton, 2007, 27(1): 62-64.)
doi: 10.3969/j.issn.1008-0821.2007.01.022
[26] 苑彬成, 方曙, 刘清, 等. 国内外引文分析研究进展综述[J]. 情报科学, 2010, 28(1): 147-153.
[26] (Yuan Bincheng, Fang Shu, Liu Qing, et al.Citation Analysis of Research Progress at Home and Abroad[J]. Information Science, 2010, 28(1): 147-153.)
[27] Newman M.Networks: An Introduction [M]. Oxford University Press, 2010: 741-743.
[1] 陈云伟, 张瑞红. 用于情报挖掘的典型网络社团划分算法比较研究*[J]. 数据分析与知识发现, 2018, 2(10): 84-94.
[2] 范如霞, 曾建勋, 高亚瑞玺. 基于合作网络的学者动态学术影响力模式识别研究[J]. 数据分析与知识发现, 2017, 1(4): 30-37.
[3] 秦晓慧, 乐小虬. 面向单篇文献引文网络的主题来源与走向追踪[J]. 现代图书情报技术, 2015, 31(9): 52-59.
[4] 陈亦佳,赵星. 基于期刊引文网络视角研究国际图书馆学情报学知识交流[J]. 现代图书情报技术, 2009, 25(6): 55-60.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn