New Technology of Library and Information Service  2014, Vol. 30 Issue (11): 79-87    DOI: 10.11925/infotech.1003-3513.2014.11.12
Design and Application of Data Fusion Software on Papers Indexed By SCI and EI
Yu Jian1, Xu Chen2, Wang Meijun2, Zhang Minhao2, Yue Zhen'gan2, Wu Xia3, Zhao Chunmei3
1 National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2 Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China;
3 Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
[Objective] A software is designed to implement duplication checking and data fusion of the papers indexed by SCI and by EI. [Context] The software can help paper analysts obtain a dataset in the same format and meet demand of micro-analysis of subject information. [Methods] Two automatic algorithms and one semi-automatic algorithm are used to complete accurate data duplicate checking on the papers indexed by SCI and EI. Data fusion is based on detailed analysis of text features of data fields of SCI and EI. [Results] It can mark papers which are duplicated between SCI papers and EI papers and create a de-duplicated data fusion sheet. [Conclusions] The construction problem of the dataset from different data sources is solved effectively and its design ideas also can be applied to other databases.

Key wordsDuplicate checking      Data fusion      EI      SCI      Software design     
Received: 06 May 2014      Published: 18 December 2014
:  G356  

Cite this article:

Yu Jian, Xu Chen, Wang Meijun, Zhang Minhao, Yue Zhen'gan, Wu Xia, Zhao Chunmei. Design and Application of Data Fusion Software on Papers Indexed By SCI and EI. New Technology of Library and Information Service, 2014, 30(11): 79-87.

URL:

