|
|
Big Data Platform for Sci-Tech Literature Based on Distributed Technology |
Chang Zhijun1,2( ),Qian Li1,2,Xie Jing1,2,Wu Zhenxin1,2,Zhang Hu1,Yu Qianqian1,Wang Ying1,Wang Yongji3 |
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China 2Department of Library Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China 3Institute of Software, Chinese Academy of Sciences, Beijing 100190, China |
|
|
Abstract [Objective] This research addresses the issues facing the storage and online access of massive text-level documents, the governance of large-scale data, and the low service performance, aiming to build a big data platform for sci-tech literature. [Methods] First, we analyzed the characteristics of distributed big data services for science and technology. Then, we adopted a co-tenant deployment strategy based on the servers and networks. Finally, we designed a big data platform for sci-tech literature with a “5+2” overall architecture. [Results] We established a PB-level big data platform for sci-tech literature. It has data storage capacity of 200TB and collected 320 million document entities as well as 6 billion entity relationship. The metadata processing performance based on MapReduce was increased by 3 times, and then formed the knowledge service architecture based on new technology. [Limitations] We did not adequately process streaming data, thus the system cannot offer prompt response for new data. [Conclusions] The new platform supports the knowledge discovery services of National Science Library, Chinese Academy of Sciences, as well as the intelligent scientific research system. It has good online services and improves the processing and service capabilities of sci-tech literature.
|
Received: 04 December 2018
Published: 12 April 2021
|
|
Corresponding Authors:
Chang Zhijun
E-mail: changzj@mail.las.ac.cn
|
[1] |
程玉, 胡凡刚, 吴运明. 教育大数据价值体现、问题反思与发展路径[J]. 软件导刊, 2020,19(5):281-284.
|
[1] |
( Cheng Yu, Hu Fangang, Wu Yunming. Reflections on the Values, Problems and Development Path of Big Data on Education[J]. Software Guide, 2020,19(5):281-284.)
|
[2] |
陶波. 基于大数据平台的医疗健康数据分析与应用模式研究[D]. 武汉: 华中科技大学, 2019.
|
[2] |
( Tao Bo. Research on Medical Health Data Analysis and Application Model Based on Big Data Platform[D]. Wuhan: Huazhong University of Science & Technology, 2019.)
|
[3] |
刘彦平. 电商企业与大数据营销[J]. 中国市场, 2016(40):28-29, 36.
|
[3] |
( Liu Yanping. E-Commerce Business and Big Data Marketing[J]. China Market, 2016(40):28-29, 36.)
|
[4] |
张应飞. 基于金融大数据的互联网信贷发展风险探析[J]. 经济研究参考, 2014(29):74-76.
|
[4] |
( Zhang Yingfei. Analysis on the Risk of Internet Credit Development Based on Financial Big Data[J]. Review of Economic Research, 2014(29):74-76.)
|
[5] |
曾文, 车尧. 科技大数据的情报分析技术研究[J]. 情报科学, 2019,37(3):93-96.
|
[5] |
( Zeng Wen, Che Yao. Research on Information Analysis Technology on Science and Technology Big Data[J]. Information Science, 2019,37(3):93-96.)
|
[6] |
杨思洛, 董嘉慧. 国内外智慧图书馆研究热点及发展趋势探究[J]. 现代情报, 2020,40(11):167-177.
|
[6] |
( Yang Siluo, Dong Jiahui. Research on Research Hotspots and Development Trends of Smart Libraries at Domestic and Abroad[J]. Journal of Modern Information, 2020,40(11):167-177.)
|
[7] |
李洁. 数据驱动下数字图书馆知识发现服务创新模式与策略研究[D]. 长春:吉林大学, 2020.
|
[7] |
( Li Jie. Data-Driven Knownledge Discovery Innovation in Digital Library: Modes and Strategies[D]. Changchun: Jilin University, 2020.)
|
[8] |
Wang Y, Ma C, Wang W, et al. An Approach of Fast Data Manipulation in HDFS with Supplementary Mechanisms[J]. Journal of Supercomputing, 2015,71(5):1736-1753.
|
[9] |
余庆. 分布式文件系统FastDFS架构剖析[J]. 程序员, 2010(11):63-65.
|
[9] |
( Yu Qing. Analysis of Distributed File System FastDFS Architecture[J]. Programmer, 2010(11):63-65.)
|
[10] |
杜娟, 苏秋月. 基于DAG的Hive数据溯源方法[J]. 信息技术与网络安全, 2020,39(11):31-37.
|
[10] |
( Du Juan, Su Qiuyue. Hive Data Provenance Method Based on DAG[J]. Information Technology and Network Security, 2020,39(11):31-37.)
|
[11] |
张学亮, 陈金勇, 陈勇. 基于Hadoop云计算平台的海量文本处理研究[J]. 无线电通信技术, 2014,40(1):54-57.
|
[11] |
( Zhang Xueliang, Chen Jinyong, Chen Yong. Research on Large-scale Text Processing Based on Hadoop Platform[J]. Radio Communications Technology, 2014,40(1):54-57.)
|
[12] |
李文栋. 基于Spark的大数据挖掘技术的研究与实现[D]. 济南:山东大学, 2015.
|
[12] |
( Li Wendong. The Research and Implementation of Mining Large Data Based on Spark[D]. Jinan: Shandong University, 2015.)
|
[13] |
高劲松, 刘洪秋. 基于知识图谱的国内外关联数据研究分析[J]. 情报科学, 2018,36(3):117-124.
|
[13] |
( Gao Jinsong, Liu Hongqiu. Research on the Linked Data at Domestic and Abroad Based on Knowledge Mapping[J]. Information Science, 2018,36(3):117-124.)
|
[14] |
张树新, 吴海斌, 蒙辉, 等. 基于SpringCloud的航运EDI平台IT生态环境设计[J]. 中国储运, 2018(2):100-103.
|
[14] |
( Zhang Shuxin, Wu Haibin, Meng Hui, et al. Design of IT Eco-environment for Shipping EDI Platform Based on SpringCloud[J]. China Storage & Transport, 2018(2):100-103.)
|
[15] |
赵宇. 大数据平台运行监控系统的研究与应用[D]. 北京: 北京交通大学, 2016.
|
[15] |
( Zhao Yu. Research and Application of Big Data Platform Operation Monitoring System[D]. Beijing: Beijing Jiaotong University, 2016.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|