Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (7): 113-122    DOI: 10.11925/infotech.2096-3467.2018.1355
Current Issue | Archive | Adv Search |
Methodology and Tools to Enrich Sci-Tech Big Data
Beibei Kong1,Jing Xie1,2(),Li Qian1,2,Zhijun Chang1,2,Zhenxin Wu1,2
1(National Science Library, Chinese Academy of Sciences, Beijing 100190, China)
2(Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China)
Download: PDF (649 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to address the issues facing sci-tech big data, such as source dispersal, low quality, and poor content. [Methods] We used value-added computing methods, such as data cleansing, entity alignment, entity field fusion, conflict detection, etc., to develop tools for the enrichment of sci-tech big data. [Results] The developed tools achieved entity data alignment at the levels of personnel, organization, conference, journal and relationship among them. The contents of the entity fields were increased by 5 to 10 times, and the entity analysis dimension was increased by 2 to 3 times. [Limitations] The timeliness and standardization of value-added data need to be optimized and improved based on service needs. [Conclusions] The proposed methods and tools enhance the knowledge discovery of the sci-tech big data and intelligent information analysis systems.

Key wordsSci-Tech Big Data      Data Appreciation      Enrichment Method     
Received: 03 December 2018      Published: 06 September 2019
ZTFLH:  TP391  
Corresponding Authors: Jing Xie     E-mail: xiej@mail.las.ac.cn

Cite this article:

Beibei Kong,Jing Xie,Li Qian,Zhijun Chang,Zhenxin Wu. Methodology and Tools to Enrich Sci-Tech Big Data. Data Analysis and Knowledge Discovery, 2019, 3(7): 113-122.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.1355     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I7/113

科研人员类型 调研数据源 是否采用
中国科学院科研人员 各研究所机构网站
中国科学院机构知识库
中国科讯注册用户
高校科研人员 高校官网 ×
中国科学家在线
其他科研人员 项目数据
由研究机构提供
博客网站 ×
对齐字段 对齐字段 对齐字段
姓名 职称 邮编
机构 学术头衔 简历
邮箱 专业 研究领域
部门、院系 学历 荣誉
实验室、研究组 电话 个人主页
性别 传真 用户照片
行政职务 通讯地址 ORCID
[1] 倪芳, 曾辉, 卓辉 , 等. Web服务在多源异构农业数据融合上的应用研究[J]. 计算机技术与发展, 2016,26(8):129-133.
[1] ( Ni Fang, Zeng Hui, Zhuo Hui , et al. Research on Application of Web Services in Multi-Source Heterogeneous Data Integration on Agriculture[J]. Computer Technology and Development, 2016,26(8):129-133.)
[2] 陆百川, 舒芹, 马广露 . 基于多源交通数据融合的短时交通流预测[J]. 重庆交通大学学报: 自然科学版, 2019,38(5):13-19, 56.
[2] ( Lu Baichuan, Shu Qin, Ma Guanglu . Short-term Traffic Flow Forecasting Based on Multi-source Traffic Data Fusion[J]. Journal of Chongqing Jiaotong University: Natural Science, 2019,38(5):13-19, 56.)
[3] 张卫东, 左娜, 陆璐 . 政府网站信息资源知识融合体系架构设计[J]. 图书情报工作, 2018,62(17):112-119.
[3] ( Zhang Weidong, Zuo Na, Lu Lu . Knowledge Fusion System Architecture Design of Government Website Information Resources[J]. Library and Information Service, 2018,62(17):112-119.)
[4] 程秀峰, 王雪杰, 夏立新 . 科研数据管理系统中增值服务调查研究[J]. 情报科学, 2018,36(10):77-83.
[4] ( Cheng Xiufeng, Wang Xuejie, Xia Lixin . Investigation on Value-added Service in Research Data Management Systems[J]. Information Science, 2018,36(10):77-83.)
[5] 于倩倩, 张建勇 . NSTL集成利用第三方来源元数据的实践与探索[J]. 现代图书情报技术, 2016(1):97-102.
[5] ( Yu Qianqian, Zhang Jianyong . Practices of NSTL Integrating and Using Third-party Metadata[J]. New Technology of Library and Information Service, 2016(1):97-102.)
[6] 田磊 . 主题爬虫搜索策略的设计与实现[D]. 北京: 北京邮电大学, 2017.
[6] ( Tian Lei . Research and Implementation of Focused Crawler with Search Strategy[D]. Beijing: Beijing University of Posts and Telecommunications, 2017.)
[7] 王颖, 吴振新, 谢靖 . 面向科技文献的语义检索系统研究综述[J]. 现代图书情报技术, 2015(5):1-7.
[7] ( Wang Ying, Wu Zhenxin, Xie Jing . Review on Semantic Retrieval System for Scientific Literature[J]. New Technology of Library and Information Service, 2015(5):1-7.)
[8] 孙海霞, 王蕾, 吴英杰 , 等. 科技文献数据库中机构名称匹配策略研究[J]. 数据分析与知识发现, 2018,2(8):88-97.
[8] ( Sun Haixia, Wang Lei, Wu Yingjie , et al. Matching Strategies for Institution Names in Literature Database[J]. Data Analysis and Knowledge Discovery, 2018,2(8):88-97.)
[9] 刘琨, 李春利, 白福春 . 我国图情领域名称规范文献计量研究[J]. 图书馆工作与研究, 2017(12):66-71.
[9] ( Liu Kun, Li Chunli, Bai Fuchun . Biobiometric Study on the Name Authority Literatures in Library and Information Field in China[J]. Library Work and Study, 2017(12):66-71.)
[10] 孟小峰, 杜治娟 . 大数据融合研究: 问题与挑战[J]. 计算机研究与发展, 2016,53(2):231-246.
doi: 10.7544/issn1000-1239.2016.20150874
[10] ( Meng Xiaofeng, Du Zhijuan . Research on the Big Data Fusion: Issues and Challenges[J]. Journal of Computer Research and Development, 2016,53(2):231-246.)
doi: 10.7544/issn1000-1239.2016.20150874
[11] Zhu Z, Zhang D, Li L , et al. Developing Institutional Repositories Network: Taking IR Grid at Chinese Academy of Sciences as an Example[J]. Chinese Journal of Library and Information Science, 2011,4(Z1):24-34.
[12] 张建勇, 黄永文, 于倩倩 , 等. 中国ORCID注册平台iAuthor的设计与实现[J]. 现代图书情报技术, 2015(3):84-91.
[12] ( Zhang Jianyong, Huang Yongwen, Yu Qianqian , et al. Design and Implementation of ORCID China Service ‘iAuthor’[J]. New Technology of Library and Information Service, 2015(3):84-91.)
[13] Vidal-Infer A, Tarazona B, Alonso-Arroyo A , et al. Public Availability of Research Data in Dentistry Journals Indexed in Journal Citation Reports[J]. Clinical Oral Investigations, 2018,22(1):275-280.
[14] 张璐杰 . 国家自然科学基金项目立项同行评议质量控制研究[D]. 北京: 北京科技大学, 2015.
[14] ( Zhang Lujie . Research on the Quality Controlment of Peer Review About NSFC Project Set-up[D]. Beijing: University of Science and Technology Beijing, 2015.)
[15] 张建勇, 于倩倩, 黄永文 , 等. NSTL统一文献元数据标准的设计与思考[J]. 数字图书馆论坛, 2016(2):33-38.
[15] ( Zhang Jianyong, Yu Qianqian, Huang Yongwen , et al. Metadata Standard Design of NSTL Unified Literature[J]. Digital Library Forum, 2016(2):33-38.)
[16] 杨秀璋 . 实体和属性对齐方法的研究与实现[D]. 北京: 北京理工大学, 2016.
[16] ( Yang Xiuzhang . Research and Implementation on Entity Alignment and Attribute Alignment[D]. Beijing: Beijing Institute of Technology, 2016.)
[17] 任平 . 高校教师个人信息数据融合的研究[D]. 北京: 北京交通大学, 2017.
[17] ( Ren Ping . Research on Data Fusion of Personal Information in Colleges and Universities[D]. Beijing: Beijing Jiaotong University, 2017.)
[18] 张琳, 秦策, 叶文豪 . 基于条件随机场的法言法语实体自动识别模型研究[J]. 数据分析与知识发现, 2017,1(11):46-52.
[18] ( Zhang Lin, Qin Ce, Ye Wenhao . Automatic Recognition of Legal Language Entities Based on Conditional Random Fields[J]. Data Analysis and Knowledge Discovery, 2017,1(11):46-52.)
[1] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[2] Li Qian,Jing Xie,Zhijun Chang,Zhenxin Wu,Dongrong Zhang. Designing Smart Knowledge Services with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 4-14.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn