Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (11): 79-87    DOI: 10.11925/infotech.1003-3513.2014.11.12
Current Issue | Archive | Adv Search |
Design and Application of Data Fusion Software on Papers Indexed By SCI and EI
Yu Jian1, Xu Chen2, Wang Meijun2, Zhang Minhao2, Yue Zhen'gan2, Wu Xia3, Zhao Chunmei3
1 National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2 Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China;
3 Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
Download: PDF(3131 KB)   HTML
Export: BibTeX | EndNote (RIS)      

[Objective] A software is designed to implement duplication checking and data fusion of the papers indexed by SCI and by EI. [Context] The software can help paper analysts obtain a dataset in the same format and meet demand of micro-analysis of subject information. [Methods] Two automatic algorithms and one semi-automatic algorithm are used to complete accurate data duplicate checking on the papers indexed by SCI and EI. Data fusion is based on detailed analysis of text features of data fields of SCI and EI. [Results] It can mark papers which are duplicated between SCI papers and EI papers and create a de-duplicated data fusion sheet. [Conclusions] The construction problem of the dataset from different data sources is solved effectively and its design ideas also can be applied to other databases.

Key wordsDuplicate checking      Data fusion      EI      SCI      Software design     
Received: 06 May 2014      Published: 18 December 2014
PACS:  G356  

Cite this article:

Yu Jian, Xu Chen, Wang Meijun, Zhang Minhao, Yue Zhen'gan, Wu Xia, Zhao Chunmei. Design and Application of Data Fusion Software on Papers Indexed By SCI and EI. New Technology of Library and Information Service, 2014, 30(11): 79-87.

URL:     OR

[1] 陈欣然, 吴均, 张晓琴, 等. 基于SCI论文的中国水产科研态势分析[J]. 中国水产科学, 2013, 20(2): 442-455. (Chen Xinran, Wu Jun, Zhang Xiaoqin, et al. Research Situation Analysis of China Fishery Sciences Based on SCI Literatures [J]. Journal of Fishery Sciences of China, 2013, 20(2): 442-455.)
[2] 邢颖, 孔红梅, 刘天星. 基于SCI发文的中国生态学研究态势文献计量分析[J]. 生态环境学报, 2010, 19(2): 447-452. (Xing Ying, Kong Hongmei, Liu Tianxing. A Bibliometrical Analysis of Status on Ecology Research in China Based on SCI Database [J]. Ecology and Environmental Sciences, 2010, 19(2): 447-452.)
[3] 张琴. 微量元素硒抗氧化研究发展态势文献计量分析[J]. 安徽农业科学, 2012, 40(26): 13164-13166. (Zhang Qin. Bibliometric Analysis of Antioxidant Effect of Selenium Based on SCI [J]. Journal of Anhui Agricultural Sciences, 2012, 40(26): 13164-13166.)
[4] 杨华, 王小萍, 干文芝, 等. 基于Web of Science的国际茶多酚类研究文献发展态势分析[J]. 茶叶科学, 2013, 33(6): 541-549. (Yang Hua, Wang Xiaoping, Gan Wenzhi, et al. Analysis on Literature Development Trend of International Tea Polyphenols Research Based on the Web of Science [J]. Journal of Tea Science, 2013, 33(6): 541-549.)
[5] 李广建, 刘晓娟, 黄永文. Cross-Search系统的设计与实现[J]. 图书馆杂志, 2006, 25(7): 46-51, 68. (Li Guangjian, Liu Xiaojuan, Huang Yongwen. Design and Implementation of Cross-Search Retrieval System [J]. Library Journal, 2006, 25(7): 46-51, 68.)
[6] Spezi V, Creaser C, O'Brien A, et al. Impact of Library Discovery Technologies: A Report for UKSG [EB/OL]. [2013-11-01]. final_report_16_12_13_by_LISU.pdf.
[7] The Truth About Federated Searching[EB/OL].[2013-10-01].
[8] 殷沈琴, 唐武京, 邵诚敏, 等. 三家资源发现系统的调研、测试和评估[J]. 图书馆杂志, 2013, 32(12): 82-86. (Yin Shenqin, Tang Wujing, Shao Chengmin, et al. Research, Testing and Assessment of Three Resource Discovery Systems [J]. Library Journal, 2013, 32(12): 82-86.)
[9] 唐振宇. 图书馆个性化信息服务跨库检索系统研究[J]. 情报科学, 2008, 26(9): 1385-1389. (Tang Zhenyu. Reseach on Multi-Database Search of Personalized Information Service for Library [J]. Information Science, 2008, 26(9): 1385-1389.)
[10] How is the EBSCOhost Integrated Search Result List Affected by De-duplication and Relevancy Ranking? [EB/OL]. [2014-05-01]. base/detail.php?id=4610.
[11] 王旭. 国内数字图书馆集成检索系统发展对策研究[D]. 湘潭: 湘潭大学, 2013. (Wang Xu. Countermeasure Research on the Development of Digital Library Integrated Retrieval System in China [D]. Xiangtan: Xiangtan University, 2013.)
[12] 郝丹, 周津慧, 关贝, 等. 文献跨库检索中去重方法研究与应用[J]. 现代图书情报技术, 2011(7): 116-120. (Hao Dan, Zhou Jinhui, Guan Bei, et al. Research on Duplicated Literature Deletion Method Based on Cross-database Search [J]. New Technology of Library and Information Service, 2011(7): 116-120.)
[13] Breeding M. Competition and Strategic Cooperation [EB/OL]. [2014-04-15]. library-systems-report-2014.
[14] 孙奇, 任慧玲. 图书馆资源发现系统的特点及其存在问题分析[J]. 图书馆学研究, 2014(3): 51-55. (Sun Qi, Ren Huiling. The Characteristics of Library Discovery System and Analysis of the Existing Problems [J]. Research on Library Science, 2014(3): 51-55.)
[15] Remove Duplicates Enrichment [EB/OL]. [2014-05-05]. https://
[16] 孙翌, 李芳. 基于Primo的一站式资源获取平台实践与思考[J]. 图书馆学研究, 2012(16): 23-28. (Sun Yi, Li Fang. The Practice of One-stop Digital Resource Service Platform Based on Primo [J]. Research on Library Science, 2012(16): 23-28.)
[17] Duplicate Detection and Resolution [EB/OL]. [2014-05-27].
[18] FRBR Work-Set Algorithm[EB/OL].[2014-05-27]. http://www.
[19] Gatenby J, Greene R O, Oskins W M, et al. GLIMIR: Manifestation and Content Clustering Within WorldCat [OL]. (2012-06-01). [2014-05-27]. 6812.
[20] A Practical Approach to Bibliographic De-duplication [EB/OL]. [2011-09-15]. 2011/9/15/a-practical-approach-to-bibliographic-de-duplication.html.
[21] WorldCat Quality [R/OL]. [2011-08-25]. worldcatquality.en.html.
[22] EndNote [CP/OL]. [2014-05-05].
[23] Thomson Data Analyzer[CP/OL].[2014-05-05]. http://www.
[24] Han J, Kamber M, Pei J. 数据挖掘: 概念与技术[M]. 范明, 孟小峰译. 3版. 北京: 机械工业出版社, 2012: 51. (Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques[M]. Translated by Fan Ming, Meng Xiaofeng. The 3rd Edition. Beijing: China Machine Press, 2012: 51.)
[25] SCI转换工具[CP/OL].[2013-03-05]. http://blog.sciencenet. cn/home.php?mod=space&uid=260374&do=blog&id=667402. (Format Transformation Software for SCI Data [CP/OL]. [2013-03-05]. &uid=260374&do=blog&id=667402.)
[26] EI转换工具[CP/OL]. [2013-03-05]. home.php?mod=space&uid=260374&do=blog&id=667400. (Format Transformation Software for EI Data [CP/OL]. [2013-03-05].

[1] Weimin Lv,Xiaomei Wang,Tao Han. Recommending Scientific Research Collaborators with Link Prediction and Extremely Randomized Trees Algorithm[J]. 数据分析与知识发现, 2017, 1(4): 38-45.
[2] Mingxuan Huang. Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining[J]. 数据分析与知识发现, 2017, 1(1): 26-36.
[3] Xiufang Xie,Xiaolin Zhang. Integrated Analysis and Visualization of Sci-Tech Roadmaps: Case Study of Renewable Energy[J]. 数据分析与知识发现, 2017, 1(1): 16-25.
[4] Li Xiangdong,Ba Zhichao,Gao Fan. Review of Digital Documents Automatic Classification Research[J]. 现代图书情报技术, 2016, 32(9): 17-26.
[5] Guan Peng,Wang Yuefen. Identifying Optimal Topic Numbers from Sci-Tech Information with LDA Model[J]. 现代图书情报技术, 2016, 32(9): 42-50.
[6] Li Daoguo,Li Lianjie,Shen Enping. New Collaborative Filtering Recommendation Algorithm Based on User Rating Time[J]. 现代图书情报技术, 2016, 32(9): 65-69.
[7] Zhang Jinzhu,Zhang Xiaolin. Radical Innovation Identification Based on Topic Mutation of Scientific Knowledge Cited in Patents[J]. 现代图书情报技术, 2016, 32(7-8): 42-50.
[8] Wang Yong,Deng Jiangzhou,Deng Yongheng,Zhang Pu. A Collaborative Filtering Recommendation Algorithm Based on Item Probability Distribution[J]. 现代图书情报技术, 2016, 32(6): 73-79.
[9] Liu Hongxu,Qu Jiansheng. Using Meta-analysis Software for Domain Knowledge Discovery[J]. 现代图书情报技术, 2016, 32(5): 9-21.
[10] Wang Yuefen,Fu Zhu,Chen Bikun. Analyzing Knowledge Structure Research with LDA Model[J]. 现代图书情报技术, 2016, 32(4): 8-19.
[11] Wang Xiaomei,Deng Qiping. Auto-Identifying Research Area Groups in Science Map[J]. 现代图书情报技术, 2016, 32(4): 48-55.
[12] Sun He,Li Shuqin,Lv Xueqiang,Liu Kehui. Retrieving Geographic Information for Micro-blog’s City Complaints[J]. 现代图书情报技术, 2016, 32(3): 58-66.
[13] He Huixin,Liu Lijuan. A Scientific Research Object Labeling System Based on Active earning[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[14] Liu Feng,Li Jianhui,Zhang Jin,Han Fang,Liu Ang. TeamDR: A Data Repository Management System for Research Teams[J]. 现代图书情报技术, 2016, 32(3): 82-89.
[15] Duan Jianyong,. Auto-Correction Search Model Based on Statistics and Characteristics[J]. 现代图书情报技术, 2016, 32(2): 34-42.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938