Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (2/3): 231-238     https://doi.org/10.11925/infotech.2096-3467.2019.0600
  专辑 本期目录 | 过刊浏览 | 高级检索 |
基于分布式大数据技术的科学计量模块化分析平台构建研究*
师洪波1,2(),郭红梅1,岳婷1,2,钱力1,2,黄定余1,常志军1
1中国科学院文献情报中心 北京 100190
2中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190
Developing Modularity Scientometrics System with Distributed Technology
Shi Hongbo1,2(),Guo Hongmei1,Yue Ting1,2,Qian Li1,2,Huang Dingyu1,Chang Zhijun1
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
全文: PDF (3684 KB)   HTML ( 6
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 设计开发模块化计量指标分析平台,满足计量分析人员多维条件定制、实时高效计算的计量分析需求。【应用背景】 面对海量科学文献数据,传统关系数据库进行大数据量计量计算时效率较低,计算时间长,分布式大数据技术为实时性科学计量分析平台提供了技术基础。【方法】 设计计量指标管理模型及基于工作流的指标构建流程,将分析任务分解为多个可独立计算单元;基于分布式大数据ES索引、Redis集合计算、预计算指标等技术,将计算统计任务转化为倒排索引查询及集合运算等操作。【结果】 为用户提供标准化的指标选择构建流程、可动态扩展的弹性任务配置及准实时的指标计算支持。【结论】 以分布式大数据技术为基础,对计算任务抽象分装,实现了高效、通用的模块化分析平台,同时本研究也可为相关分析决策系统提供参考。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
师洪波
郭红梅
岳婷
钱力
黄定余
常志军
关键词 分布式技术模块化分析科学计量    
Abstract

[Objective] This paper designs and develops a modularity scientometrics system, aiming to meet the needs and real time processing tasks facing researchers. [Context] The relational database system cannot manage the vast amount of literature resources, while the distributed technology provides highly efficient computating ability for the scientometrics data.[Methods] We designed a genenal indicator model and a standard task workflow. Then,we built the proposed system based on ES, Redis and modularity indicator designs.[Results] Our platform provides standard workflow for users to conduct scientometrics tasks and receive resluts in almost real time.[Conclusions] The distributed technology and modularity design could help us build a highly efficient and universal scientometrics as well as decision making systems.

Key wordsDistributed Technology    Modularity Analysis    Scientometrics
收稿日期: 2019-06-03      出版日期: 2020-04-26
ZTFLH:  TP391  
基金资助:*本文系中国科学院基金项目“基于科学计量数据的模块化资源服务平台建设”的研究成果之一(院1750)
通讯作者: 师洪波     E-mail: shihb@mail.las.ac.cn
引用本文:   
师洪波,郭红梅,岳婷,钱力,黄定余,常志军. 基于分布式大数据技术的科学计量模块化分析平台构建研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 231-238.
Shi Hongbo,Guo Hongmei,Yue Ting,Qian Li,Huang Dingyu,Chang Zhijun. Developing Modularity Scientometrics System with Distributed Technology. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 231-238.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.0600      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I2/3/231
Fig.1  系统逻辑架构
Fig.2  任务指标管理方案
Fig.3  指标计算弹性扩展方案
Fig.4  队列分割方式计算
Fig.5  关系数据库计算及ES指标计算
Fig.6  集合指标计算流程示意
Fig.7  JSON结果存储及使用示意
Fig.8  数据源选择和统计口径选择
Fig.9  目标对象及指标选择过程
Fig.10  指标任务最终确认
Fig.11  指标计算结果及展示
指标个数 计算时间 单指标平均时间
15 64s 4.3s
Table 1  指标计算效率
[1] Kalachikhin P A . The Principles of the Design of the State Scientometric System[J]. Automatic Documentation and Mathematical Linguistics, 2016,50(4):161-172.
[2] Jin B, Zhang J, Chen D , et al. Development of the Chinese Scientometric Indicators (CSI)[J]. Scientometrics, 2002,54(1):145-154.
[3] Grivel L, Polanco X, Kaplan A . A Computer System for Big Scientometrics at the Age of the World Wide Web[J]. Scientometrics, 1997,40(3):493-506.
[4] 崔雷, 胡海荣, 李纪宾 . 文献计量学共引分析系统设计与开发[J]. 情报学报, 2000,19(4):308-312.
[4] ( Cui Lei, Hu Hairong, Li Jibin . Development of Co-citation Cluster Analysis System[J]. Journal of the China Society for Scientific and Technical Information, 2000,19(4):308-312.)
[5] 程学旗, 靳小龙, 王元卓 , 等. 大数据系统和分析技术综述[J]. 软件学报, 2014,25(9):1889-1908.
[5] ( Cheng Xueqi, Jin Xiaolong, Wang Yuanzhuo , et al. Survey on Big Data System and Analytic Technology[J]. Journal of Software, 2014,25(9):1889-1908.)
[6] 王元卓, 靳小龙, 程学旗 . 大数据分析系统创新平台与生态建设[J]. 大数据, 2018,4(1):90-99.
[6] ( Wang Yuanzhuo, Jin Xiaolong, Cheng Xueqi . Innovation Platform and Ecology Construction of Big Data Analysis System[J]. Big Data Research, 2018,4(1):90-99.)
[7] Hive[EB/OL]. [ 2019- 06- 01]. http://hive.apache.org/.
[8] HBase[EB/OL]. [ 2019- 06- 01]. http://hbase.apache.org/.
[9] InCites[EB/OL]. [2019-06-01].https://incites.clarivate.com/.
[10] ElasticSearch[EB/OL]. [2019-06-01].https://www.elastic.co/cn/products/elasticsearch.
[11] Redis[EB/OL]. [2019-06-01].https://redis.io/.
[12] Aggregations[EB/OL]. [2019-06-01].https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html.
[13] Filters Aggregation [EB/OL]. [2019-06-01].https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html.
[14] Terms Aggregation [EB/OL]. [2019-06-01].https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html.
[15] Scripting[EB/OL]. [2019-06-01].https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html.
[1] 陈云伟. 科学计量学的发展与布局:1978-2008[J]. 现代图书情报技术, 2010, 26(1): 71-76.
[2] 崔雷,刘伟,闫雷,张晗,侯跃芳,黄莹娜,张浩. 文献数据库中书目信息共现挖掘系统的开发*[J]. 现代图书情报技术, 2008, 24(8): 70-75.
[3] 吴振新. 中国科学计量指标数据库的设计与实现[J]. 现代图书情报技术, 2001, 17(1): 68-70.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn