Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (11): 79-87    DOI: 10.11925/infotech.1003-3513.2014.11.12
Current Issue | Archive | Adv Search |
Design and Application of Data Fusion Software on Papers Indexed By SCI and EI
Yu Jian1, Xu Chen2, Wang Meijun2, Zhang Minhao2, Yue Zhen'gan2, Wu Xia3, Zhao Chunmei3
1 National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2 Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China;
3 Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] A software is designed to implement duplication checking and data fusion of the papers indexed by SCI and by EI. [Context] The software can help paper analysts obtain a dataset in the same format and meet demand of micro-analysis of subject information. [Methods] Two automatic algorithms and one semi-automatic algorithm are used to complete accurate data duplicate checking on the papers indexed by SCI and EI. Data fusion is based on detailed analysis of text features of data fields of SCI and EI. [Results] It can mark papers which are duplicated between SCI papers and EI papers and create a de-duplicated data fusion sheet. [Conclusions] The construction problem of the dataset from different data sources is solved effectively and its design ideas also can be applied to other databases.

Key wordsDuplicate checking      Data fusion      EI      SCI      Software design     
Received: 06 May 2014      Published: 18 December 2014
:  G356  

Cite this article:

Yu Jian, Xu Chen, Wang Meijun, Zhang Minhao, Yue Zhen'gan, Wu Xia, Zhao Chunmei. Design and Application of Data Fusion Software on Papers Indexed By SCI and EI. New Technology of Library and Information Service, 2014, 30(11): 79-87.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.11.12     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I11/79

[1] 陈欣然, 吴均, 张晓琴, 等. 基于SCI论文的中国水产科研态势分析[J]. 中国水产科学, 2013, 20(2): 442-455. (Chen Xinran, Wu Jun, Zhang Xiaoqin, et al. Research Situation Analysis of China Fishery Sciences Based on SCI Literatures [J]. Journal of Fishery Sciences of China, 2013, 20(2): 442-455.)
[2] 邢颖, 孔红梅, 刘天星. 基于SCI发文的中国生态学研究态势文献计量分析[J]. 生态环境学报, 2010, 19(2): 447-452. (Xing Ying, Kong Hongmei, Liu Tianxing. A Bibliometrical Analysis of Status on Ecology Research in China Based on SCI Database [J]. Ecology and Environmental Sciences, 2010, 19(2): 447-452.)
[3] 张琴. 微量元素硒抗氧化研究发展态势文献计量分析[J]. 安徽农业科学, 2012, 40(26): 13164-13166. (Zhang Qin. Bibliometric Analysis of Antioxidant Effect of Selenium Based on SCI [J]. Journal of Anhui Agricultural Sciences, 2012, 40(26): 13164-13166.)
[4] 杨华, 王小萍, 干文芝, 等. 基于Web of Science的国际茶多酚类研究文献发展态势分析[J]. 茶叶科学, 2013, 33(6): 541-549. (Yang Hua, Wang Xiaoping, Gan Wenzhi, et al. Analysis on Literature Development Trend of International Tea Polyphenols Research Based on the Web of Science [J]. Journal of Tea Science, 2013, 33(6): 541-549.)
[5] 李广建, 刘晓娟, 黄永文. Cross-Search系统的设计与实现[J]. 图书馆杂志, 2006, 25(7): 46-51, 68. (Li Guangjian, Liu Xiaojuan, Huang Yongwen. Design and Implementation of Cross-Search Retrieval System [J]. Library Journal, 2006, 25(7): 46-51, 68.)
[6] Spezi V, Creaser C, O'Brien A, et al. Impact of Library Discovery Technologies: A Report for UKSG [EB/OL]. [2013-11-01]. http://www.uksg.org/sites/uksg.org/files/UKSG_ final_report_16_12_13_by_LISU.pdf.
[7] The Truth About Federated Searching[EB/OL].[2013-10-01]. http://www.infotoday.com/it/oct03/hane1.shtml.
[8] 殷沈琴, 唐武京, 邵诚敏, 等. 三家资源发现系统的调研、测试和评估[J]. 图书馆杂志, 2013, 32(12): 82-86. (Yin Shenqin, Tang Wujing, Shao Chengmin, et al. Research, Testing and Assessment of Three Resource Discovery Systems [J]. Library Journal, 2013, 32(12): 82-86.)
[9] 唐振宇. 图书馆个性化信息服务跨库检索系统研究[J]. 情报科学, 2008, 26(9): 1385-1389. (Tang Zhenyu. Reseach on Multi-Database Search of Personalized Information Service for Library [J]. Information Science, 2008, 26(9): 1385-1389.)
[10] How is the EBSCOhost Integrated Search Result List Affected by De-duplication and Relevancy Ranking? [EB/OL]. [2014-05-01]. http://support.epnet.com/knowledge_ base/detail.php?id=4610.
[11] 王旭. 国内数字图书馆集成检索系统发展对策研究[D]. 湘潭: 湘潭大学, 2013. (Wang Xu. Countermeasure Research on the Development of Digital Library Integrated Retrieval System in China [D]. Xiangtan: Xiangtan University, 2013.)
[12] 郝丹, 周津慧, 关贝, 等. 文献跨库检索中去重方法研究与应用[J]. 现代图书情报技术, 2011(7): 116-120. (Hao Dan, Zhou Jinhui, Guan Bei, et al. Research on Duplicated Literature Deletion Method Based on Cross-database Search [J]. New Technology of Library and Information Service, 2011(7): 116-120.)
[13] Breeding M. Competition and Strategic Cooperation [EB/OL]. [2014-04-15]. http://www.americanlibrariesmagazine.org/article/ library-systems-report-2014.
[14] 孙奇, 任慧玲. 图书馆资源发现系统的特点及其存在问题分析[J]. 图书馆学研究, 2014(3): 51-55. (Sun Qi, Ren Huiling. The Characteristics of Library Discovery System and Analysis of the Existing Problems [J]. Research on Library Science, 2014(3): 51-55.)
[15] Remove Duplicates Enrichment [EB/OL]. [2014-05-05]. https:// developers.exlibrisgroup.com/primo/integrations/bo/removedupenrichment.
[16] 孙翌, 李芳. 基于Primo的一站式资源获取平台实践与思考[J]. 图书馆学研究, 2012(16): 23-28. (Sun Yi, Li Fang. The Practice of One-stop Digital Resource Service Platform Based on Primo [J]. Research on Library Science, 2012(16): 23-28.)
[17] Duplicate Detection and Resolution [EB/OL]. [2014-05-27]. http://oclc.org/services/metadata/quality/ddr.en.html.
[18] FRBR Work-Set Algorithm[EB/OL].[2014-05-27]. http://www. oclc.org/research/activities/frbralgorithm.html?urlm=159780.
[19] Gatenby J, Greene R O, Oskins W M, et al. GLIMIR: Manifestation and Content Clustering Within WorldCat [OL]. (2012-06-01). [2014-05-27]. http://journal.code4lib.org/articles/ 6812.
[20] A Practical Approach to Bibliographic De-duplication [EB/OL]. [2011-09-15]. http://www.roganhamby.com/evergreen/ 2011/9/15/a-practical-approach-to-bibliographic-de-duplication.html.
[21] WorldCat Quality [R/OL]. [2011-08-25]. https://oclc.org/reports/ worldcatquality.en.html.
[22] EndNote [CP/OL]. [2014-05-05]. http://endnote.com/.
[23] Thomson Data Analyzer[CP/OL].[2014-05-05]. http://www. thomsonscientific.com.cn/productsservices/TDA/.
[24] Han J, Kamber M, Pei J. 数据挖掘: 概念与技术[M]. 范明, 孟小峰译. 3版. 北京: 机械工业出版社, 2012: 51. (Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques[M]. Translated by Fan Ming, Meng Xiaofeng. The 3rd Edition. Beijing: China Machine Press, 2012: 51.)
[25] SCI转换工具[CP/OL].[2013-03-05]. http://blog.sciencenet. cn/home.php?mod=space&uid=260374&do=blog&id=667402. (Format Transformation Software for SCI Data [CP/OL]. [2013-03-05]. http://blog.sciencenet.cn/home.php?mod=space &uid=260374&do=blog&id=667402.)
[26] EI转换工具[CP/OL]. [2013-03-05]. http://blog.sciencenet.cn/ home.php?mod=space&uid=260374&do=blog&id=667400. (Format Transformation Software for EI Data [CP/OL]. [2013-03-05]. http://blog.sciencenet.cn/home.php?mod=space&uid=260374&do=blog&id=667400.)

[1] Lu Yunmeng,Liu Tiezhong. Diffusion Model for Tacit Knowledge of Scientific Cooperation Network Based on Relevance: Case Study of Major Sci-Tech Projects[J]. 数据分析与知识发现, 2021, 5(9): 10-20.
[2] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[3] Xu Zengxulin, Xie Jing, Yu Qianqian. Designing New Evaluation Model for Talents[J]. 数据分析与知识发现, 2021, 5(8): 122-131.
[4] Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[5] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[6] Zhu Hou,Fang Qingyan. Quantifying and Examining Privacy Paradox of Social Media Users[J]. 数据分析与知识发现, 2021, 5(7): 111-125.
[7] Zhang Le, Leng Jidong, Lv Xueqiang, Cui Zhuo, Wang Lei, You Xindong. RLCPAR: A Rewriting Model for Chinese Patent Abstracts Based on Reinforcement Learning[J]. 数据分析与知识发现, 2021, 5(7): 59-69.
[8] Ruan Xiaoyun,Liao Jianbin,Li Xiang,Yang Yang,Li Daifeng. Interpretable Recommendation of Reinforcement Learning Based on Talent Knowledge Graph Reasoning[J]. 数据分析与知识发现, 2021, 5(6): 36-50.
[9] Shen Si,Li Qinyu,Ye Yuan,Sun Hao,Ye Wenhao. Topic Mining and Evolution Analysis of Medical Sci-Tech Reports with TWE Model[J]. 数据分析与知识发现, 2021, 5(3): 35-44.
[10] Zhang Xin,Wen Yi,Xu Haiyun. A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[11] He Xueyao, Ma Tingcan, Yue Mingliang, Ou Guiyan. Analyzing Highly Cited Papers Sponsored by National Natural Science Foundation of China[J]. 数据分析与知识发现, 2021, 5(2): 61-69.
[12] Li Xiao, Qu Jiansheng. Review of Application and Evolution of Meta-Analysis in Social Sciences[J]. 数据分析与知识发现, 2021, 5(11): 1-12.
[13] Chen Shiji, Qiu Junping, Yu Bo. Topic Analysis of LIS Big Data Research with Overlay Mapping[J]. 数据分析与知识发现, 2021, 5(10): 51-59.
[14] Yu Shuo,Hayat Dino Bedru,Chu Xinbei,Yuan Yuyuan,Wan Liangtian,Xia Feng. Understanding Serendipity in Science: A Survey[J]. 数据分析与知识发现, 2021, 5(1): 16-35.
[15] Li Guangjian,Wang Kai,Zhang Qingzhi. Analysis Framework Based on Multi-Source Data for US Export Control: An Empirical Study[J]. 数据分析与知识发现, 2020, 4(9): 26-40.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn