|
|
Design and Application of Data Fusion Software on Papers Indexed By SCI and EI |
Yu Jian1, Xu Chen2, Wang Meijun2, Zhang Minhao2, Yue Zhen'gan2, Wu Xia3, Zhao Chunmei3 |
1 National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2 Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China;
3 Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China |
|
|
Abstract [Objective] A software is designed to implement duplication checking and data fusion of the papers indexed by SCI and by EI. [Context] The software can help paper analysts obtain a dataset in the same format and meet demand of micro-analysis of subject information. [Methods] Two automatic algorithms and one semi-automatic algorithm are used to complete accurate data duplicate checking on the papers indexed by SCI and EI. Data fusion is based on detailed analysis of text features of data fields of SCI and EI. [Results] It can mark papers which are duplicated between SCI papers and EI papers and create a de-duplicated data fusion sheet. [Conclusions] The construction problem of the dataset from different data sources is solved effectively and its design ideas also can be applied to other databases.
|
Received: 06 May 2014
Published: 18 December 2014
|
|
[1] 陈欣然, 吴均, 张晓琴, 等. 基于SCI论文的中国水产科研态势分析[J]. 中国水产科学, 2013, 20(2): 442-455. (Chen Xinran, Wu Jun, Zhang Xiaoqin, et al. Research Situation Analysis of China Fishery Sciences Based on SCI Literatures [J]. Journal of Fishery Sciences of China, 2013, 20(2): 442-455.)
[2] 邢颖, 孔红梅, 刘天星. 基于SCI发文的中国生态学研究态势文献计量分析[J]. 生态环境学报, 2010, 19(2): 447-452. (Xing Ying, Kong Hongmei, Liu Tianxing. A Bibliometrical Analysis of Status on Ecology Research in China Based on SCI Database [J]. Ecology and Environmental Sciences, 2010, 19(2): 447-452.)
[3] 张琴. 微量元素硒抗氧化研究发展态势文献计量分析[J]. 安徽农业科学, 2012, 40(26): 13164-13166. (Zhang Qin. Bibliometric Analysis of Antioxidant Effect of Selenium Based on SCI [J]. Journal of Anhui Agricultural Sciences, 2012, 40(26): 13164-13166.)
[4] 杨华, 王小萍, 干文芝, 等. 基于Web of Science的国际茶多酚类研究文献发展态势分析[J]. 茶叶科学, 2013, 33(6): 541-549. (Yang Hua, Wang Xiaoping, Gan Wenzhi, et al. Analysis on Literature Development Trend of International Tea Polyphenols Research Based on the Web of Science [J]. Journal of Tea Science, 2013, 33(6): 541-549.)
[5] 李广建, 刘晓娟, 黄永文. Cross-Search系统的设计与实现[J]. 图书馆杂志, 2006, 25(7): 46-51, 68. (Li Guangjian, Liu Xiaojuan, Huang Yongwen. Design and Implementation of Cross-Search Retrieval System [J]. Library Journal, 2006, 25(7): 46-51, 68.)
[6] Spezi V, Creaser C, O'Brien A, et al. Impact of Library Discovery Technologies: A Report for UKSG [EB/OL]. [2013-11-01]. http://www.uksg.org/sites/uksg.org/files/UKSG_ final_report_16_12_13_by_LISU.pdf.
[7] The Truth About Federated Searching[EB/OL].[2013-10-01]. http://www.infotoday.com/it/oct03/hane1.shtml.
[8] 殷沈琴, 唐武京, 邵诚敏, 等. 三家资源发现系统的调研、测试和评估[J]. 图书馆杂志, 2013, 32(12): 82-86. (Yin Shenqin, Tang Wujing, Shao Chengmin, et al. Research, Testing and Assessment of Three Resource Discovery Systems [J]. Library Journal, 2013, 32(12): 82-86.)
[9] 唐振宇. 图书馆个性化信息服务跨库检索系统研究[J]. 情报科学, 2008, 26(9): 1385-1389. (Tang Zhenyu. Reseach on Multi-Database Search of Personalized Information Service for Library [J]. Information Science, 2008, 26(9): 1385-1389.)
[10] How is the EBSCOhost Integrated Search Result List Affected by De-duplication and Relevancy Ranking? [EB/OL]. [2014-05-01]. http://support.epnet.com/knowledge_ base/detail.php?id=4610.
[11] 王旭. 国内数字图书馆集成检索系统发展对策研究[D]. 湘潭: 湘潭大学, 2013. (Wang Xu. Countermeasure Research on the Development of Digital Library Integrated Retrieval System in China [D]. Xiangtan: Xiangtan University, 2013.)
[12] 郝丹, 周津慧, 关贝, 等. 文献跨库检索中去重方法研究与应用[J]. 现代图书情报技术, 2011(7): 116-120. (Hao Dan, Zhou Jinhui, Guan Bei, et al. Research on Duplicated Literature Deletion Method Based on Cross-database Search [J]. New Technology of Library and Information Service, 2011(7): 116-120.)
[13] Breeding M. Competition and Strategic Cooperation [EB/OL]. [2014-04-15]. http://www.americanlibrariesmagazine.org/article/ library-systems-report-2014.
[14] 孙奇, 任慧玲. 图书馆资源发现系统的特点及其存在问题分析[J]. 图书馆学研究, 2014(3): 51-55. (Sun Qi, Ren Huiling. The Characteristics of Library Discovery System and Analysis of the Existing Problems [J]. Research on Library Science, 2014(3): 51-55.)
[15] Remove Duplicates Enrichment [EB/OL]. [2014-05-05]. https:// developers.exlibrisgroup.com/primo/integrations/bo/removedupenrichment.
[16] 孙翌, 李芳. 基于Primo的一站式资源获取平台实践与思考[J]. 图书馆学研究, 2012(16): 23-28. (Sun Yi, Li Fang. The Practice of One-stop Digital Resource Service Platform Based on Primo [J]. Research on Library Science, 2012(16): 23-28.)
[17] Duplicate Detection and Resolution [EB/OL]. [2014-05-27]. http://oclc.org/services/metadata/quality/ddr.en.html.
[18] FRBR Work-Set Algorithm[EB/OL].[2014-05-27]. http://www. oclc.org/research/activities/frbralgorithm.html?urlm=159780.
[19] Gatenby J, Greene R O, Oskins W M, et al. GLIMIR: Manifestation and Content Clustering Within WorldCat [OL]. (2012-06-01). [2014-05-27]. http://journal.code4lib.org/articles/ 6812.
[20] A Practical Approach to Bibliographic De-duplication [EB/OL]. [2011-09-15]. http://www.roganhamby.com/evergreen/ 2011/9/15/a-practical-approach-to-bibliographic-de-duplication.html.
[21] WorldCat Quality [R/OL]. [2011-08-25]. https://oclc.org/reports/ worldcatquality.en.html.
[22] EndNote [CP/OL]. [2014-05-05]. http://endnote.com/.
[23] Thomson Data Analyzer[CP/OL].[2014-05-05]. http://www. thomsonscientific.com.cn/productsservices/TDA/.
[24] Han J, Kamber M, Pei J. 数据挖掘: 概念与技术[M]. 范明, 孟小峰译. 3版. 北京: 机械工业出版社, 2012: 51. (Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques[M]. Translated by Fan Ming, Meng Xiaofeng. The 3rd Edition. Beijing: China Machine Press, 2012: 51.)
[25] SCI转换工具[CP/OL].[2013-03-05]. http://blog.sciencenet. cn/home.php?mod=space&uid=260374&do=blog&id=667402. (Format Transformation Software for SCI Data [CP/OL]. [2013-03-05]. http://blog.sciencenet.cn/home.php?mod=space &uid=260374&do=blog&id=667402.)
[26] EI转换工具[CP/OL]. [2013-03-05]. http://blog.sciencenet.cn/ home.php?mod=space&uid=260374&do=blog&id=667400. (Format Transformation Software for EI Data [CP/OL]. [2013-03-05]. http://blog.sciencenet.cn/home.php?mod=space&uid=260374&do=blog&id=667400.) |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|