Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (8): 75-85    DOI: 10.11925/infotech.2096-3467.2020.0002
Current Issue | Archive | Adv Search |
Expanding Scholar Labels with Research Similarity and Co-authorship Network
Sheng Jiaqi,Xu Xin()
Department of Information Management, Faculty of Economics and Management,East China Normal University, Shanghai 200062, China
Download: PDF (822 KB)   HTML ( 4
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to add more academic labels for researchers from scholarly abstracts, aiming to predict their future research interests. [Methods] First, we extracted the basic labels from abstracts with the TF-IDF method. Then, we identified researchers sharing similar academic interests and co-authoriship. Finally, we expanded the basic labels with those from similar scholars and team members. [Results] Compared with existing methods, the proposed one increased recall rate of predicting by 8.33% on average. [Limitations] Our sample size was small, and we only examined scholarly articles in one language. [Conclusions] The proposed method could predict scholars’ future research interests.

Key wordsLabel Expansion      Topic Similarity      Co-authorship Network     
Received: 02 January 2020      Published: 14 September 2020
ZTFLH:  TP393  
Corresponding Authors: Xu Xin     E-mail: xxu@infor.ecnu.edu.cn

Cite this article:

Sheng Jiaqi, Xu Xin. Expanding Scholar Labels with Research Similarity and Co-authorship Network. Data Analysis and Knowledge Discovery, 2020, 4(8): 75-85.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0002     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I8/75

Overall Experiment Flow
序号 标签 权重 序号 标签 权重 序号 标签 权重
1 网络 0.139 0 18 管理 0.032 0 35 情报学 0.024 7
2 MLIS 0.107 1 19 标注 0.031 8 36 建设 0.024 6
3 信息 0.103 8 20 描述 0.029 8 37 领域 0.024 2
4 图书馆 0.101 8 21 物种 0.029 0 38 多样性 0.023 5
5 网站 0.081 0 22 数据 0.029 0 39 现状 0.023 0
6 知识 0.054 4 23 植物志 0.028 3 40 算法 0.022 9
7 服务 0.050 7 24 调查 0.027 5 41 中文 0.021 5
8 链接 0.050 4 25 图书 0.027 4 42 互联网 0.020 6
9 评价 0.044 4 26 论文 0.027 0 43 电子政务 0.020 6
10 被引 0.044 1 27 样本 0.026 9 44 市场导向 0.019 4
11 下载量 0.043 9 28 期刊 0.026 5 45 学术 0.019 3
12 培养 0.043 7 29 基础 0.026 4 46 抽取 0.019 3
13 分析 0.043 4 30 公共 0.025 8 47 文本 0.018 9
14 资源 0.042 4 31 指标 0.025 6 48 状况 0.018 6
15 创新 0.039 3 32 教育 0.025 4 49 过程 0.018 3
16 影响力 0.033 5 33 发展 0.025 4 50 显著 0.017 9
17 本体 0.033 3 34 模型 0.025 1
Top 50 Basic Tags from Paper of Duan Yufeng
学者 学术标签(部分)
段宇锋 知识、信息、管理、情报学、网络、分析、互联网、图书、知识经济、图书馆、企业、Internet、数字、电子邮件、信息网络、链接、MEDLINE、参考文献、互联网服务
邱均平 文献、信息、情报学、计量学、知识、情报、分析、资源、管理、网络化、引文、评价、知识产权、网络、知识经济、图书、科学、期刊、学科、图书馆、图书馆学
胡昌平 信息、情报学、情报、知识产权、分析、文献、图书馆、网络化、网络、知识、管理、资源、企业、信息管理、服务、评价、情报信息、学科、社会、知识经济、体系
马海群 信息、知识产权、知识、管理、图书馆、情报学、网络、情报、专利、知识经济、文献、分析、著作权、信息管理、计量学、法律、咨询业
王宏鑫 情报学、学科、层次、科学、计量学、论文、文献、期刊、体系、信息、数据库、分布、引用、动态、知识、体系化、有序化、评析、自引、他律性、CNKI、双律性
岳亚 情报学、信息、学科、知识、数据库、文献、网络、multimedia、intelligence、管理、商业秘密、层次、书目、版权、electronic、CIP、competitive、law、commerce
Academic Labels of Important Members of Duan Yufeng's Team Before 2004
学者 学术标签(部分) 主题相似度
柳丹枫 图书馆、党校、数据库、电子图书、资源、人才资源、意识、数字化、开发利用、服务、图书、福建省、信息、管理、数字 0.719
王纯 图书馆、文献学、文献、资源、信息、读者、数字、建设、数字化、西部、馆藏、libraries、电子图书、China、中国、网络、古籍 0.615
阮建海 金融证券、Winisis、信息、数据库、因特网、Internet、论文、资源、检索、查准率、查全率、免费软件、ISISforDOS、CDS 0.545
周文荣 知识、数据库、咨询业、图书馆、检索、管理、高校、自由、检索系统、高新技术、情报、现代化、咨询、文章、传播 0.538
张晓林 图书馆、数字、描述、建设、标准规范、开放、MR、Registry、数据、资源、科学、技术、检索、信息、网关、XML、Metadata 0.532
郭小刚 图书馆、立法、数据库、用户、馆员、信息检索、法制建设、分析、数字化、网络、教育、理论、信息、环境 0.499
严峰 检索、文献、信息、理念、知识、语言、开发、资源、知识产权、WTO、自然语言、资源共享、信息技术、情报检索、信息安全 0.469
戚敏 检索、书店、评价、图书馆、查准率、查全率、文献数据库、购书、CJN、期刊网、易用性、性能指标、时效 0.460
柴一葵 赠书、图书馆、旧书、文献、新书、老化、资源、专业书、出版、主题标引、购置费、复本、质量、知识结构、时效性、滞销、馆藏 0.453
张冬梅 图书馆、Java、网络、馆藏、读者、数据库、数字、信息、需求、网络化、分类、检索、高校、资源、文献、数据完整性、全文检索 0.448
Scholar Labels with the Highest Similarity to Duan Yufeng's Topic Before 2004
Recall of Basic Label and Expanded Label Prediction
预测阶段 基础标签独有 双方共有 扩展标签独有
第二阶段 配置、因子、影响力、互联网、测度、对象 网站、资源、领域、建设、网络、现状、图书馆、参考文献、信息、分析、计量学、链接 美国、基础、分析法、网页、文献、Web、层次、重要、体系、评价、分类、应用
第三阶段 样本、因素、差异、效率 基础、信息、领域、分类、网络、建设、服务、内容、知识、指标、图书馆、学术、数据 分析法、科研、专业、比较、系统、优化、环境、作者、团队
第四阶段 实践、抽取、样本 资源、基础、领域、发展、服务、描述、建设、现状、知识、模型、专业、图书、图书馆、学术、评价、数据、物种、数字、标注、分析 阅读、调查、识别、本体、植物、论文、公共、创新、教育、引文、相关、组织、优化、期刊、社会、研究生、被引
The Distribution of Correct Prediction Labels
[1] 许海云, 尹春晓, 郭婷, 等. 学科交叉研究综述[J]. 图书情报工作, 2015,59(5):119-127.
[1] ( Xu Haiyun, Yin Chunxiao, Guo Ting, et al. Interdisciplinary Research Review[J]. Library and Information Service, 2015,59(5):119-127.)
[2] 商宪丽, 王学东, 张煜轩. 基于标签共现的学术博客知识资源聚合研究[J]. 情报科学, 2016,34(5):125-129.
[2] ( Shang Xianli, Wang Xuedong, Zhang Yuxuan. Academic Blog Knowledge Resource Aggregations Based on Tag Co-occurrences[J]. Information Science, 2016,34(5):125-129.)
[3] 吴小兰, 章成志. 结合用户关系网和标签共现网的微博用户标签推荐研究[J]. 情报学报, 2015,34(5):459-465.
[3] ( Wu Xiaolan, Zhang Chengzhi. Microblogger Tag Predication Based on User Network and Tag Co-occurrence Network[J]. Journal of the China Society for Scientific and Technical Information, 2015,34(5):459-465.)
[4] 周小平, 梁循, 张海燕. 基于R-C模型的微博用户社区发现[J]. 软件学报, 2014,25(12):2808-2823.
[4] ( Zhou Xiaoping, Liang Xun, Zhang Haiyan. User Community Detection on Micro-blog Using R-C Model[J]. Journal of Software, 2014,25(12):2808-2823.)
[5] Khasseh A A, Soheili F, Moghaddam H S, et al. Intellectual Structure of Knowledge in iMetrics: A Co-word Analysis[J]. Information Processing & Management, 2017,53(3):705-720.
doi: 10.1016/j.ipm.2017.02.001
[6] 王忠义, 陈伶丽, 黄京. 我国知识服务领域的核心作者共被引分析[J]. 情报科学, 2017,35(12):66-72.
[6] ( Wang Zhongyi, Chen Lingli, Huang Jing. Co-citation Analysis of Core Authors on Knowledge Services in China[J]. Information Science, 2017,35(12):66-72.)
[7] Zhao D Z, Strotmann A. Evolution of Research Activities and Intellectual Influences in Information Science 1996-2005: Introducing Author Bibliographic-coupling Analysis[J]. Journal of the American Society for Information Science & Technology, 2010,59(13):2070-2086.
[8] 余传明, 左宇恒, 郭亚静, 等. 基于复合主题演化模型的作者研究兴趣动态发现[J]. 山东大学学报(理学版), 2018,53(9):23-34.
[8] ( Yu Chuanming, Zuo Yuheng, Guo Yajing, et al. Dynamic Discovery of Authors Research Interest Based on the Combined Topic Evolutional Model[J]. Journal of Shandong University (Natural Science), 2018,53(9):23-34.)
[9] 史庆伟, 李艳妮, 郭朋亮. 科技文献中作者研究兴趣动态发现[J]. 计算机应用, 2013,33(11):3080-3083.
[9] ( Shi Qingwei, Li Yanni, Guo Pengliang. Dynamic Finding of Authors’ Research Interests in Scientific Literature[J]. Journal of Computer Applications, 2013,33(11):3080-3083.)
[10] 傅城州, 汤庸, 贺超波, 等. 基于标签相似度计算的学术圈构建方法[J]. 计算机科学, 2016,43(9):52-56, 76.
[10] ( Fu Chengzhou, Tang Yong, He Chaobo, et al. Construction Method of Academic Circle Based on Label Similarity Computation[J]. Computer Science, 2016,43(9):52-56, 76.)
[11] 李纲, 徐健, 毛进, 等. 合著作者研究兴趣相似性分布研究[J]. 图书情报工作, 2017,61(6):92-98
[11] .( Li Gang, Xu Jian, Mao Jin, et al. Research on Distribution of Similarity of Research Interests Between Coauthors[J]. Library and Information Service, 2017,61(6):92-98.)
[12] 谢鹏. 面向学术文献的学者兴趣标签识别方法[J]. 情报工程, 2019,5(3):65-73.
[12] ( Xie Peng. Recognition of Scholar Interest Tag for Academic Literatures[J]. Technology Intelligence Engineering, 2019,5(3):65-73.)
[13] 王仁武, 张文慧. 学术用户画像的行为与兴趣标签构建与应用[J]. 现代情报, 2019,39(9):54-63.
[13] ( Wang Renwu, Zhang Wenhui. Behavior and Interest Labeling Construction and Application of Academic User Portraits[J]. Journal of Modern Information, 2019,39(9):54-63.)
[14] 吴磊, 岳峰, 王含茹, 等. 一种融合科研人员标签的学术论文推荐方法[J]. 计算机科学, 2020,47(2):51-57.
[14] ( Wu Lei, Yue Feng, Wang Hanru, et al. Academic Paper Recommendation Method Combined with Researcher Tag[J]. Computer Science, 2020,47(2):51-57.)
[15] 钟克吟. 基于标签与协同过滤算法的学术资源推荐系统的构建[J]. 图书馆理论与实践, 2014(9):80-82.
[15] ( Zhong Keyin. Construction of Academic Resource Recommendation System Based on Label and Collaborative Filtering Algorithm[J]. Library Theory and Practice, 2014(9):80-82.)
[16] 肖诗伯, 杨玉梅, 兰鹰, 等. 基于多标签属性的学术文献推荐研究[J]. 情报探索, 2015(4):8-10.
[16] ( Xiao Shibo, Yang Yumei, Lan Ying, et al. Research of Multi-label Attribute-based Academic Document Recommendation[J]. Information Research, 2015(4):8-10.)
[17] 巴志超, 李纲, 朱世伟. 基于语义网络的研究兴趣相似性度量方法[J]. 现代图书情报技术, 2016(4):81-90.
[17] ( Ba Zhichao, Li Gang, Zhu Shiwei. Similarity Measurement of Research Interests in Semantic Network[J]. New Technology of Library and Information Service, 2016(4):81-90.)
[18] 冯小东, 武森, 王佳晔. 基于作者引用文献关系的潜在研究兴趣主题发现[J]. 中国科技论文, 2014(1):65-70.
[18] ( Feng Xiaodong, Wu Sen, Wang Jiaye. Detecting Potential Research Interest Topics Using Relationship Between Authors and Cited Papers[J]. China Sciencepaper, 2014(1):65-70.)
[19] Rahman A I M J, Guns R, Rousseau, R, et al. Is the Expertise of Evaluation Panels Congruent with the Research Interests of the Research Groups: A Quantitative Approach Based on Barycenters[J]. Journal of Informetrics, 2015,9(4):704-721.
doi: 10.1016/j.joi.2015.07.009
[20] 张学义, 胡兴雨, 吴俊, 等. 基于兴趣的科研合作网络演化模型[J]. 计算机工程与应用, 2010,46(30):104-107, 111.
doi: 10.3778/j.issn.1002-8331.2010.30.031
[20] ( Zhang Xueyi, Hu Xingyu, Wu Jun, et al. Evolutionary Model of Interest-based Scientific Collaboration Network[J]. Computer Engineering and Applications, 2010,46(30):104-107, 111.)
doi: 10.3778/j.issn.1002-8331.2010.30.031
[21] 徐健, 毛进, 叶光辉, 等. 基于核心作者研究兴趣相似性网络的社群隶属研究——以国内情报学领域为例[J]. 图书情报工作, 2018,62(12):57-64
[21] .( Xu Jian, Mao Jin, Ye Guanghui, et al. Research on Community Membership Based on the Research Interest Similarity Network of Core Authors: Taking the Domestic Field of Information Science as an Example[J]. Library and Information Service, 2018,62(12):57-64.)
[22] 李纲, 李岚凤, 毛进, 等. 作者合著网络中研究兴趣相似性实证研究[J]. 图书情报工作, 2015,59(2):75-81.
[22] ( Li Gang, Li Lanfeng, Mao Jin, et al. Empirical Research on Similarity of Research Interests in Co-authorship Network[J]. Library and Information Service, 2015,59(2):75-81.)
[23] 刘非凡, 李长玲, 魏绪秋. 基于2-模网络和G-N社群聚类算法的潜在合作者研究——以国内图情领域的社会网络分析研究为例[J]. 情报理论与实践, 2014,37(6):117-122.
[23] ( Liu Feifan, Li Changling, Wei Xuqiu. Research on Potential Collaborators Based on 2-mode Network and G-N Community Clustering Algorithm: Taking Social Network Analysis in the Field of Library and Information Science as an Example[J]. Information Studies: Theory & Application, 2014,37(6):117-122.)
[24] Cagliero L, Garza P, Kavoosifar M R, et al. Discovering Cross-topic Collaborations Among Researchers by Exploiting Weighted Association Rules[J]. Scientometrics, 2018,116(2):1273-1301.
doi: 10.1007/s11192-018-2737-3
[25] Zhang Q M, Xu X K, Zhu Y X, et al. Measuring Multiple Evolution Mechanisms of Complex Networks[J]. Scientific Reports, 2015,5: 10350.
doi: 10.1038/srep10350 pmid: 26065382
[26] 张金柱, 韩涛, 王小梅. 作者-关键词二分网络中的合著关系预测研究[J]. 图书情报工作, 2016,60(21):74-80.
[26] ( Zhang Jinzhu, Han Tao, Wang Xiaomei. Co-authorship Prediction in the Author-keyword Bipartite Networks[J]. Library and Information Service, 2016,60(21):74-80.)
[27] 张金柱, 王小梅, 韩涛. 文献-作者二分网络中基于路径组合的合著关系预测研究[J]. 现代图书情报技术, 2016(10):42-49.
[27] ( Zhang Jinzhu, Wang Xiaomei, Han Tao. Predicting Co-authorship with Combination of Paths in Paper-author Bipartite Networks[J]. New Technology of Library and Information Service, 2016 ( 10):42-49.)
[28] 张金柱, 于文倩, 刘菁婕, 等. 基于网络表示学习的科研合作预测研究[J]. 情报学报, 2018,37(2):132-139.
[28] ( Zhang Jinzhu, Yu Wenqian, Liu Jingjie, et al. Predicting Research Collaborations Based on Network Embedding[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(2):132-139.)
[29] Zhang J Z. Uncovering Mechanisms of Co-authorship Evolution by Multirelations-based Link Prediction[J]. Information Processing & Management, 2017,53(1):42-51.
doi: 10.1016/j.ipm.2016.06.005
[30] 汪志兵, 韩文民, 孙竹梅, 等. 基于网络拓扑结构与节点属性特征融合的科研合作预测研究[J]. 情报理论与实践, 2019,42(8):116-120, 109.
[30] ( Wang Zhibing, Han Wenmin, Sun Zhumei, et al. Research on Scientific Collaboration Prediction Based on the Combination of Network Topology and Node Attributes[J]. Information Studies: Theory & Application, 2019,42(8):116-120, 109.)
[1] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[2] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[3] Zhang Jinzhu,Wang Xiaomei,Han Tao. Predicting Co-authorship with Combination of Paths in Paper-author Bipartite Networks[J]. 现代图书情报技术, 2016, 32(10): 42-49.
[4] Shen Gengyu, Huang Shuiqing, Wang Dongbo. On the Scientific Research Teams Identification Method Taking Co-authorship of Collaboration as the Source Data[J]. 现代图书情报技术, 2013, 29(1): 57-62.
[5] Wang Jimin, Lilei Mingzi, Zhang Peng. Co-authorship Network Analysis in the Research Field of Search Engine’s Log Mining[J]. 现代图书情报技术, 2011, 27(4): 58-63.
[6] Li Lirong Qian Wei Feng Yuqiang. Analysis of Author’s Extensity Centrality in Co-authorship Networks in the Field of Management Information Systems[J]. 现代图书情报技术, 2010, 26(5): 66-72.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn