科技大数据增值丰富化方法研究与工具研发 <sup>*</sup>

doi:10.11925/infotech.2096-3467.2018.1355

数据分析与知识发现

2019, Vol. 3

Issue (7): 113-122 https://doi.org/10.11925/infotech.2096-3467.2018.1355

应用论文

本期目录 | 过刊浏览 | 高级检索

科技大数据增值丰富化方法研究与工具研发 ^*

孔贝贝¹,谢靖^1,²(

),钱力^1,²,常志军^1,²,吴振新^1,²

1(中国科学院文献情报中心北京 100190)
2(中国科学院大学经济与管理学院图书情报与档案管理系北京 100190)

Methodology and Tools to Enrich Sci-Tech Big Data

Beibei Kong¹,Jing Xie^1,²(

),Li Qian^1,²,Zhijun Chang^1,²,Zhenxin Wu^1,²

1(National Science Library, Chinese Academy of Sciences, Beijing 100190, China)
2(Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China)

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (649 KB) HTML ( 14 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】解决科技大数据数据源分散、质量不高、内容单薄等问题。【方法】采用数据清洗、实体对齐、实体字段融合、冲突检测等增值计算方法, 设计开发一套科技大数据增值丰富化的工具。【结果】通过本文研发的丰富化工具, 在人员、机构、会议、期刊实体及实体关系层面实现实体数据对齐, 实体字段内容增加5-10倍, 实体分析维度提升2-3倍。【局限】增值数据的及时性、规范性需要结合服务需求在实际应用中不断优化提升。【结论】研究成果提升了科技大数据知识发现平台以及相关情报智能分析系统的数据服务维度及深度。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	孔贝贝
	谢靖
	钱力
	常志军
	吴振新

关键词 ：科技大数据, 数据增值, 丰富化方法

Abstract：

[Objective] This paper tries to address the issues facing sci-tech big data, such as source dispersal, low quality, and poor content. [Methods] We used value-added computing methods, such as data cleansing, entity alignment, entity field fusion, conflict detection, etc., to develop tools for the enrichment of sci-tech big data. [Results] The developed tools achieved entity data alignment at the levels of personnel, organization, conference, journal and relationship among them. The contents of the entity fields were increased by 5 to 10 times, and the entity analysis dimension was increased by 2 to 3 times. [Limitations] The timeliness and standardization of value-added data need to be optimized and improved based on service needs. [Conclusions] The proposed methods and tools enhance the knowledge discovery of the sci-tech big data and intelligent information analysis systems.

Key words： Sci-Tech Big Data Data Appreciation Enrichment Method

收稿日期: 2018-12-03 出版日期: 2019-09-06

ZTFLH:

TP391

基金资助:*本文系国家科技图书文献中心下一代国家科技创新开放知识服务系统项目“用户画像模型及关键技术研究”(科1810);中国科学院文献情报能力建设专项项目“基于大数据计算的知识发现服务平台建设”的研究成果之一(院1759)

通讯作者: 谢靖 E-mail: xiej@mail.las.ac.cn

引用本文:

孔贝贝,谢靖,钱力,常志军,吴振新. 科技大数据增值丰富化方法研究与工具研发 ^*[J]. 数据分析与知识发现, 2019, 3(7): 113-122.
Beibei Kong,Jing Xie,Li Qian,Zhijun Chang,Zhenxin Wu. Methodology and Tools to Enrich Sci-Tech Big Data. Data Analysis and Knowledge Discovery, 2019, 3(7): 113-122.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.1355 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2019/V3/I7/113

科技大数据平台丰富化建设整体框架

科研人员增值建设数据源选用结果

科研人员多数据源字段对齐格式

实体数据规范化方法

实体及实体关系融合

科研人员数据获取

科研人员规范化

[1]	倪芳, 曾辉, 卓辉 , 等. Web服务在多源异构农业数据融合上的应用研究[J]. 计算机技术与发展, 2016,26(8):129-133.
[1]	( Ni Fang, Zeng Hui, Zhuo Hui , et al. Research on Application of Web Services in Multi-Source Heterogeneous Data Integration on Agriculture[J]. Computer Technology and Development, 2016,26(8):129-133.)
[2]	陆百川, 舒芹, 马广露 . 基于多源交通数据融合的短时交通流预测[J]. 重庆交通大学学报: 自然科学版, 2019,38(5):13-19, 56.
[2]	( Lu Baichuan, Shu Qin, Ma Guanglu . Short-term Traffic Flow Forecasting Based on Multi-source Traffic Data Fusion[J]. Journal of Chongqing Jiaotong University: Natural Science, 2019,38(5):13-19, 56.)
[3]	张卫东, 左娜, 陆璐 . 政府网站信息资源知识融合体系架构设计[J]. 图书情报工作, 2018,62(17):112-119.
[3]	( Zhang Weidong, Zuo Na, Lu Lu . Knowledge Fusion System Architecture Design of Government Website Information Resources[J]. Library and Information Service, 2018,62(17):112-119.)
[4]	程秀峰, 王雪杰, 夏立新 . 科研数据管理系统中增值服务调查研究[J]. 情报科学, 2018,36(10):77-83.
[4]	( Cheng Xiufeng, Wang Xuejie, Xia Lixin . Investigation on Value-added Service in Research Data Management Systems[J]. Information Science, 2018,36(10):77-83.)
[5]	于倩倩, 张建勇 . NSTL集成利用第三方来源元数据的实践与探索[J]. 现代图书情报技术, 2016(1):97-102.
[5]	( Yu Qianqian, Zhang Jianyong . Practices of NSTL Integrating and Using Third-party Metadata[J]. New Technology of Library and Information Service, 2016(1):97-102.)
[6]	田磊 . 主题爬虫搜索策略的设计与实现[D]. 北京: 北京邮电大学, 2017.
[6]	( Tian Lei . Research and Implementation of Focused Crawler with Search Strategy[D]. Beijing: Beijing University of Posts and Telecommunications, 2017.)
[7]	王颖, 吴振新, 谢靖 . 面向科技文献的语义检索系统研究综述[J]. 现代图书情报技术, 2015(5):1-7.
[7]	( Wang Ying, Wu Zhenxin, Xie Jing . Review on Semantic Retrieval System for Scientific Literature[J]. New Technology of Library and Information Service, 2015(5):1-7.)
[8]	孙海霞, 王蕾, 吴英杰 , 等. 科技文献数据库中机构名称匹配策略研究[J]. 数据分析与知识发现, 2018,2(8):88-97.
[8]	( Sun Haixia, Wang Lei, Wu Yingjie , et al. Matching Strategies for Institution Names in Literature Database[J]. Data Analysis and Knowledge Discovery, 2018,2(8):88-97.)
[9]	刘琨, 李春利, 白福春 . 我国图情领域名称规范文献计量研究[J]. 图书馆工作与研究, 2017(12):66-71.
[9]	( Liu Kun, Li Chunli, Bai Fuchun . Biobiometric Study on the Name Authority Literatures in Library and Information Field in China[J]. Library Work and Study, 2017(12):66-71.)
[10]	孟小峰, 杜治娟 . 大数据融合研究: 问题与挑战[J]. 计算机研究与发展, 2016,53(2):231-246. doi: 10.7544/issn1000-1239.2016.20150874
[10]	( Meng Xiaofeng, Du Zhijuan . Research on the Big Data Fusion: Issues and Challenges[J]. Journal of Computer Research and Development, 2016,53(2):231-246.) doi: 10.7544/issn1000-1239.2016.20150874
[11]	Zhu Z, Zhang D, Li L , et al. Developing Institutional Repositories Network: Taking IR Grid at Chinese Academy of Sciences as an Example[J]. Chinese Journal of Library and Information Science, 2011,4(Z1):24-34.
[12]	张建勇, 黄永文, 于倩倩 , 等. 中国ORCID注册平台iAuthor的设计与实现[J]. 现代图书情报技术, 2015(3):84-91.
[12]	( Zhang Jianyong, Huang Yongwen, Yu Qianqian , et al. Design and Implementation of ORCID China Service ‘iAuthor’[J]. New Technology of Library and Information Service, 2015(3):84-91.)
[13]	Vidal-Infer A, Tarazona B, Alonso-Arroyo A , et al. Public Availability of Research Data in Dentistry Journals Indexed in Journal Citation Reports[J]. Clinical Oral Investigations, 2018,22(1):275-280.
[14]	张璐杰 . 国家自然科学基金项目立项同行评议质量控制研究[D]. 北京: 北京科技大学, 2015.
[14]	( Zhang Lujie . Research on the Quality Controlment of Peer Review About NSFC Project Set-up[D]. Beijing: University of Science and Technology Beijing, 2015.)
[15]	张建勇, 于倩倩, 黄永文 , 等. NSTL统一文献元数据标准的设计与思考[J]. 数字图书馆论坛, 2016(2):33-38.
[15]	( Zhang Jianyong, Yu Qianqian, Huang Yongwen , et al. Metadata Standard Design of NSTL Unified Literature[J]. Digital Library Forum, 2016(2):33-38.)
[16]	杨秀璋 . 实体和属性对齐方法的研究与实现[D]. 北京: 北京理工大学, 2016.
[16]	( Yang Xiuzhang . Research and Implementation on Entity Alignment and Attribute Alignment[D]. Beijing: Beijing Institute of Technology, 2016.)
[17]	任平 . 高校教师个人信息数据融合的研究[D]. 北京: 北京交通大学, 2017.
[17]	( Ren Ping . Research on Data Fusion of Personal Information in Colleges and Universities[D]. Beijing: Beijing Jiaotong University, 2017.)
[18]	张琳, 秦策, 叶文豪 . 基于条件随机场的法言法语实体自动识别模型研究[J]. 数据分析与知识发现, 2017,1(11):46-52.
[18]	( Zhang Lin, Qin Ce, Ye Wenhao . Automatic Recognition of Legal Language Entities Based on Conditional Random Fields[J]. Data Analysis and Knowledge Discovery, 2017,1(11):46-52.)

[1]	王颖,钱力,谢靖,常志军,孔贝贝. 科技大数据知识图谱构建模型与方法研究^*[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[2]	钱力,谢靖,常志军,吴振新,张冬荣. 基于科技大数据的智能知识服务体系研究设计^*[J]. 数据分析与知识发现, 2019, 3(1): 4-14.
[3]	胡吉颖,谢靖,钱力,付常雷. 基于知识图谱的科技大数据知识发现平台建设^*[J]. 数据分析与知识发现, 2019, 3(1): 55-62.

Viewed

Full text

Abstract

Cited

Shared

Discussed