%A Yu Jian, Xu Chen, Wang Meijun, Zhang Minhao, Yue Zhen'gan, Wu Xia, Zhao Chunmei %T Design and Application of Data Fusion Software on Papers Indexed By SCI and EI %0 Journal Article %D 2014 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.1003-3513.2014.11.12 %P 79-87 %V 30 %N 11 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_3977.shtml} %8 2014-11-25 %X

[Objective] A software is designed to implement duplication checking and data fusion of the papers indexed by SCI and by EI. [Context] The software can help paper analysts obtain a dataset in the same format and meet demand of micro-analysis of subject information. [Methods] Two automatic algorithms and one semi-automatic algorithm are used to complete accurate data duplicate checking on the papers indexed by SCI and EI. Data fusion is based on detailed analysis of text features of data fields of SCI and EI. [Results] It can mark papers which are duplicated between SCI papers and EI papers and create a de-duplicated data fusion sheet. [Conclusions] The construction problem of the dataset from different data sources is solved effectively and its design ideas also can be applied to other databases.