Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (11): 79-87    DOI: 10.11925/infotech.1003-3513.2014.11.12
Current Issue | Archive | Adv Search |
Design and Application of Data Fusion Software on Papers Indexed By SCI and EI
Yu Jian1, Xu Chen2, Wang Meijun2, Zhang Minhao2, Yue Zhen'gan2, Wu Xia3, Zhao Chunmei3
1 National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2 Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China;
3 Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
Download: PDF(3131 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] A software is designed to implement duplication checking and data fusion of the papers indexed by SCI and by EI. [Context] The software can help paper analysts obtain a dataset in the same format and meet demand of micro-analysis of subject information. [Methods] Two automatic algorithms and one semi-automatic algorithm are used to complete accurate data duplicate checking on the papers indexed by SCI and EI. Data fusion is based on detailed analysis of text features of data fields of SCI and EI. [Results] It can mark papers which are duplicated between SCI papers and EI papers and create a de-duplicated data fusion sheet. [Conclusions] The construction problem of the dataset from different data sources is solved effectively and its design ideas also can be applied to other databases.

Key wordsDuplicate checking      Data fusion      EI      SCI      Software design     
Received: 06 May 2014      Published: 18 December 2014
:  G356  

Cite this article:

Yu Jian, Xu Chen, Wang Meijun, Zhang Minhao, Yue Zhen'gan, Wu Xia, Zhao Chunmei. Design and Application of Data Fusion Software on Papers Indexed By SCI and EI. New Technology of Library and Information Service, 2014, 30(11): 79-87.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.11.12     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I11/79

[1] 陈欣然, 吴均, 张晓琴, 等. 基于SCI论文的中国水产科研态势分析[J]. 中国水产科学, 2013, 20(2): 442-455. (Chen Xinran, Wu Jun, Zhang Xiaoqin, et al. Research Situation Analysis of China Fishery Sciences Based on SCI Literatures [J]. Journal of Fishery Sciences of China, 2013, 20(2): 442-455.)
[2] 邢颖, 孔红梅, 刘天星. 基于SCI发文的中国生态学研究态势文献计量分析[J]. 生态环境学报, 2010, 19(2): 447-452. (Xing Ying, Kong Hongmei, Liu Tianxing. A Bibliometrical Analysis of Status on Ecology Research in China Based on SCI Database [J]. Ecology and Environmental Sciences, 2010, 19(2): 447-452.)
[3] 张琴. 微量元素硒抗氧化研究发展态势文献计量分析[J]. 安徽农业科学, 2012, 40(26): 13164-13166. (Zhang Qin. Bibliometric Analysis of Antioxidant Effect of Selenium Based on SCI [J]. Journal of Anhui Agricultural Sciences, 2012, 40(26): 13164-13166.)
[4] 杨华, 王小萍, 干文芝, 等. 基于Web of Science的国际茶多酚类研究文献发展态势分析[J]. 茶叶科学, 2013, 33(6): 541-549. (Yang Hua, Wang Xiaoping, Gan Wenzhi, et al. Analysis on Literature Development Trend of International Tea Polyphenols Research Based on the Web of Science [J]. Journal of Tea Science, 2013, 33(6): 541-549.)
[5] 李广建, 刘晓娟, 黄永文. Cross-Search系统的设计与实现[J]. 图书馆杂志, 2006, 25(7): 46-51, 68. (Li Guangjian, Liu Xiaojuan, Huang Yongwen. Design and Implementation of Cross-Search Retrieval System [J]. Library Journal, 2006, 25(7): 46-51, 68.)
[6] Spezi V, Creaser C, O'Brien A, et al. Impact of Library Discovery Technologies: A Report for UKSG [EB/OL]. [2013-11-01]. http://www.uksg.org/sites/uksg.org/files/UKSG_ final_report_16_12_13_by_LISU.pdf.
[7] The Truth About Federated Searching[EB/OL].[2013-10-01]. http://www.infotoday.com/it/oct03/hane1.shtml.
[8] 殷沈琴, 唐武京, 邵诚敏, 等. 三家资源发现系统的调研、测试和评估[J]. 图书馆杂志, 2013, 32(12): 82-86. (Yin Shenqin, Tang Wujing, Shao Chengmin, et al. Research, Testing and Assessment of Three Resource Discovery Systems [J]. Library Journal, 2013, 32(12): 82-86.)
[9] 唐振宇. 图书馆个性化信息服务跨库检索系统研究[J]. 情报科学, 2008, 26(9): 1385-1389. (Tang Zhenyu. Reseach on Multi-Database Search of Personalized Information Service for Library [J]. Information Science, 2008, 26(9): 1385-1389.)
[10] How is the EBSCOhost Integrated Search Result List Affected by De-duplication and Relevancy Ranking? [EB/OL]. [2014-05-01]. http://support.epnet.com/knowledge_ base/detail.php?id=4610.
[11] 王旭. 国内数字图书馆集成检索系统发展对策研究[D]. 湘潭: 湘潭大学, 2013. (Wang Xu. Countermeasure Research on the Development of Digital Library Integrated Retrieval System in China [D]. Xiangtan: Xiangtan University, 2013.)
[12] 郝丹, 周津慧, 关贝, 等. 文献跨库检索中去重方法研究与应用[J]. 现代图书情报技术, 2011(7): 116-120. (Hao Dan, Zhou Jinhui, Guan Bei, et al. Research on Duplicated Literature Deletion Method Based on Cross-database Search [J]. New Technology of Library and Information Service, 2011(7): 116-120.)
[13] Breeding M. Competition and Strategic Cooperation [EB/OL]. [2014-04-15]. http://www.americanlibrariesmagazine.org/article/ library-systems-report-2014.
[14] 孙奇, 任慧玲. 图书馆资源发现系统的特点及其存在问题分析[J]. 图书馆学研究, 2014(3): 51-55. (Sun Qi, Ren Huiling. The Characteristics of Library Discovery System and Analysis of the Existing Problems [J]. Research on Library Science, 2014(3): 51-55.)
[15] Remove Duplicates Enrichment [EB/OL]. [2014-05-05]. https:// developers.exlibrisgroup.com/primo/integrations/bo/removedupenrichment.
[16] 孙翌, 李芳. 基于Primo的一站式资源获取平台实践与思考[J]. 图书馆学研究, 2012(16): 23-28. (Sun Yi, Li Fang. The Practice of One-stop Digital Resource Service Platform Based on Primo [J]. Research on Library Science, 2012(16): 23-28.)
[17] Duplicate Detection and Resolution [EB/OL]. [2014-05-27]. http://oclc.org/services/metadata/quality/ddr.en.html.
[18] FRBR Work-Set Algorithm[EB/OL].[2014-05-27]. http://www. oclc.org/research/activities/frbralgorithm.html?urlm=159780.
[19] Gatenby J, Greene R O, Oskins W M, et al. GLIMIR: Manifestation and Content Clustering Within WorldCat [OL]. (2012-06-01). [2014-05-27]. http://journal.code4lib.org/articles/ 6812.
[20] A Practical Approach to Bibliographic De-duplication [EB/OL]. [2011-09-15]. http://www.roganhamby.com/evergreen/ 2011/9/15/a-practical-approach-to-bibliographic-de-duplication.html.
[21] WorldCat Quality [R/OL]. [2011-08-25]. https://oclc.org/reports/ worldcatquality.en.html.
[22] EndNote [CP/OL]. [2014-05-05]. http://endnote.com/.
[23] Thomson Data Analyzer[CP/OL].[2014-05-05]. http://www. thomsonscientific.com.cn/productsservices/TDA/.
[24] Han J, Kamber M, Pei J. 数据挖掘: 概念与技术[M]. 范明, 孟小峰译. 3版. 北京: 机械工业出版社, 2012: 51. (Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques[M]. Translated by Fan Ming, Meng Xiaofeng. The 3rd Edition. Beijing: China Machine Press, 2012: 51.)
[25] SCI转换工具[CP/OL].[2013-03-05]. http://blog.sciencenet. cn/home.php?mod=space&uid=260374&do=blog&id=667402. (Format Transformation Software for SCI Data [CP/OL]. [2013-03-05]. http://blog.sciencenet.cn/home.php?mod=space &uid=260374&do=blog&id=667402.)
[26] EI转换工具[CP/OL]. [2013-03-05]. http://blog.sciencenet.cn/ home.php?mod=space&uid=260374&do=blog&id=667400. (Format Transformation Software for EI Data [CP/OL]. [2013-03-05]. http://blog.sciencenet.cn/home.php?mod=space&uid=260374&do=blog&id=667400.)

[1] Beibei Kong,Jing Xie,Li Qian,Zhijun Chang,Zhenxin Wu. Methodology and Tools to Enrich Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(7): 113-122.
[2] Ming Yi,Tingting Zhang. Ranking Answer Quality of Popular Q&A Community[J]. 数据分析与知识发现, 2019, 3(6): 12-20.
[3] Junliang Yao,Xiaoqiu Le. Semantic Matching for Sci-Tech Novelty Retrieval[J]. 数据分析与知识发现, 2019, 3(6): 50-56.
[4] Yujie Cao,Jin Mao,Rongqing Pan,Zhichao Ba,Gang Li. Analyzing Characteristics of Interdisciplinary Research Evolutions: Case Study of Medical Informatics[J]. 数据分析与知识发现, 2019, 3(5): 107-116.
[5] Jianhua Liu,Zhixiong Zhang,Qin Zhang. Revealing Sci-Tech Policy Evolution with Entity Relationship[J]. 数据分析与知识发现, 2019, 3(5): 57-67.
[6] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[7] Xiaolan Wu,Chengzhi Zhang. Analysis of Knowledge Flow Based on Academic Social Networks:
A Case Study of ScienceNet.cn
[J]. 数据分析与知识发现, 2019, 3(4): 107-116.
[8] Xiang Li,Xiaodong Qian. Research on Impact of Commodity Online Evaluation for Consumption Convergence[J]. 数据分析与知识发现, 2019, 3(3): 102-111.
[9] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[10] Qingmin Liu,Changqing Yao,Chongde Shi,Xiaojie Wen,Yueying Sun. Vocabulary Optimization of Neural Machine Translation for Scientific and Technical Document[J]. 数据分析与知识发现, 2019, 3(3): 76-82.
[11] Juhua Wu,Yu Wang,Ming Li,Shaoyun Cai. Knowledge Discovery of Online Health Communities with Weighted Knowledge Network[J]. 数据分析与知识发现, 2019, 3(2): 108-117.
[12] Jian Li,Mingyue Wang,Luming Xu,Yingchun Tian. The Construction of Digital Medical Information Service Evaluation System Based on User Perceived Value[J]. 数据分析与知识发现, 2019, 3(2): 118-126.
[13] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[14] Li Qian,Jing Xie,Zhijun Chang,Zhenxin Wu,Dongrong Zhang. Designing Smart Knowledge Services with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 4-14.
[15] Jing Li,Xiao Liu,Xiaoli Wang. Financial Decision Knowledge Acquisition Based on Neighborhood Rough Set and Ensemble Classifiers with Grid Search[J]. 数据分析与知识发现, 2019, 3(1): 85-94.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn