Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (11): 79-87     https://doi.org/10.11925/infotech.1003-3513.2014.11.12
  应用实践 本期目录 | 过刊浏览 | 高级检索 |
SCI/EI文献数据融合软件设计与实现
于健1, 许晨2, 王媚君2, 张旻浩2, 岳桢干2, 吴霞3, 赵春梅3
1 中国科学院文献情报中心 北京 100190;
2 中国科学院上海技术物理研究所 上海 200083;
3 中国科学院高能物理研究所 北京 100049
Design and Application of Data Fusion Software on Papers Indexed By SCI and EI
Yu Jian1, Xu Chen2, Wang Meijun2, Zhang Minhao2, Yue Zhen'gan2, Wu Xia3, Zhao Chunmei3
1 National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2 Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China;
3 Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
全文: PDF (3131 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 设计一款具有SCI/EI数据库文献数据查重和数据融合功能的软件.[应用背景] 帮助分析人员获得来自SCI/EI数据库的文献融合数据集, 更好地满足微观学科情报分析对灵活构建多来源期刊文献数据集的需求.[方法] 利用两种自动算法和一种半自动算法实现SCI/EI文献数据的准确查重, 在对两者的全记录字段进行深入微观文本分析的基础上实现数据融合.[结果] 可自动标记SCI/EI文献数据的重复记录并生成查重后的融合数据表.[结论] 有效解决两个不同期刊文献数据源的统一分析数据集构建问题.

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
许晨
王媚君
张旻浩
岳桢干
于健
吴霞
赵春梅
关键词 查重融合EISCI软件设计    
Abstract

[Objective] A software is designed to implement duplication checking and data fusion of the papers indexed by SCI and by EI. [Context] The software can help paper analysts obtain a dataset in the same format and meet demand of micro-analysis of subject information. [Methods] Two automatic algorithms and one semi-automatic algorithm are used to complete accurate data duplicate checking on the papers indexed by SCI and EI. Data fusion is based on detailed analysis of text features of data fields of SCI and EI. [Results] It can mark papers which are duplicated between SCI papers and EI papers and create a de-duplicated data fusion sheet. [Conclusions] The construction problem of the dataset from different data sources is solved effectively and its design ideas also can be applied to other databases.

Key wordsDuplicate checking    Data fusion    EI    SCI    Software design
收稿日期: 2014-05-06      出版日期: 2014-12-18
:  G356  
基金资助:

本文系中国科学院文献情报中心青年人才领域前沿项目"学科化知识服务辅助工具优化设计"(项目编号:青1209)的研究成果之一.

通讯作者: 于健 E-mail: yuj@mail.las.ac.cn     E-mail: yuj@mail.las.ac.cn
作者简介: 作者贡献声明: 于健: 软件功能设计实现和论文撰写; 许晨, 王媚君, 张旻浩, 岳桢干, 赵春梅, 吴霞: 参与软件功能需求分析和软件测试.
引用本文:   
于健, 许晨, 王媚君, 张旻浩, 岳桢干, 吴霞, 赵春梅. SCI/EI文献数据融合软件设计与实现[J]. 现代图书情报技术, 2014, 30(11): 79-87.
Yu Jian, Xu Chen, Wang Meijun, Zhang Minhao, Yue Zhen'gan, Wu Xia, Zhao Chunmei. Design and Application of Data Fusion Software on Papers Indexed By SCI and EI. New Technology of Library and Information Service, 2014, 30(11): 79-87.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.11.12      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I11/79

[1] 陈欣然, 吴均, 张晓琴, 等. 基于SCI论文的中国水产科研态势分析[J]. 中国水产科学, 2013, 20(2): 442-455. (Chen Xinran, Wu Jun, Zhang Xiaoqin, et al. Research Situation Analysis of China Fishery Sciences Based on SCI Literatures [J]. Journal of Fishery Sciences of China, 2013, 20(2): 442-455.)
[2] 邢颖, 孔红梅, 刘天星. 基于SCI发文的中国生态学研究态势文献计量分析[J]. 生态环境学报, 2010, 19(2): 447-452. (Xing Ying, Kong Hongmei, Liu Tianxing. A Bibliometrical Analysis of Status on Ecology Research in China Based on SCI Database [J]. Ecology and Environmental Sciences, 2010, 19(2): 447-452.)
[3] 张琴. 微量元素硒抗氧化研究发展态势文献计量分析[J]. 安徽农业科学, 2012, 40(26): 13164-13166. (Zhang Qin. Bibliometric Analysis of Antioxidant Effect of Selenium Based on SCI [J]. Journal of Anhui Agricultural Sciences, 2012, 40(26): 13164-13166.)
[4] 杨华, 王小萍, 干文芝, 等. 基于Web of Science的国际茶多酚类研究文献发展态势分析[J]. 茶叶科学, 2013, 33(6): 541-549. (Yang Hua, Wang Xiaoping, Gan Wenzhi, et al. Analysis on Literature Development Trend of International Tea Polyphenols Research Based on the Web of Science [J]. Journal of Tea Science, 2013, 33(6): 541-549.)
[5] 李广建, 刘晓娟, 黄永文. Cross-Search系统的设计与实现[J]. 图书馆杂志, 2006, 25(7): 46-51, 68. (Li Guangjian, Liu Xiaojuan, Huang Yongwen. Design and Implementation of Cross-Search Retrieval System [J]. Library Journal, 2006, 25(7): 46-51, 68.)
[6] Spezi V, Creaser C, O'Brien A, et al. Impact of Library Discovery Technologies: A Report for UKSG [EB/OL]. [2013-11-01]. http://www.uksg.org/sites/uksg.org/files/UKSG_ final_report_16_12_13_by_LISU.pdf.
[7] The Truth About Federated Searching[EB/OL].[2013-10-01]. http://www.infotoday.com/it/oct03/hane1.shtml.
[8] 殷沈琴, 唐武京, 邵诚敏, 等. 三家资源发现系统的调研、测试和评估[J]. 图书馆杂志, 2013, 32(12): 82-86. (Yin Shenqin, Tang Wujing, Shao Chengmin, et al. Research, Testing and Assessment of Three Resource Discovery Systems [J]. Library Journal, 2013, 32(12): 82-86.)
[9] 唐振宇. 图书馆个性化信息服务跨库检索系统研究[J]. 情报科学, 2008, 26(9): 1385-1389. (Tang Zhenyu. Reseach on Multi-Database Search of Personalized Information Service for Library [J]. Information Science, 2008, 26(9): 1385-1389.)
[10] How is the EBSCOhost Integrated Search Result List Affected by De-duplication and Relevancy Ranking? [EB/OL]. [2014-05-01]. http://support.epnet.com/knowledge_ base/detail.php?id=4610.
[11] 王旭. 国内数字图书馆集成检索系统发展对策研究[D]. 湘潭: 湘潭大学, 2013. (Wang Xu. Countermeasure Research on the Development of Digital Library Integrated Retrieval System in China [D]. Xiangtan: Xiangtan University, 2013.)
[12] 郝丹, 周津慧, 关贝, 等. 文献跨库检索中去重方法研究与应用[J]. 现代图书情报技术, 2011(7): 116-120. (Hao Dan, Zhou Jinhui, Guan Bei, et al. Research on Duplicated Literature Deletion Method Based on Cross-database Search [J]. New Technology of Library and Information Service, 2011(7): 116-120.)
[13] Breeding M. Competition and Strategic Cooperation [EB/OL]. [2014-04-15]. http://www.americanlibrariesmagazine.org/article/ library-systems-report-2014.
[14] 孙奇, 任慧玲. 图书馆资源发现系统的特点及其存在问题分析[J]. 图书馆学研究, 2014(3): 51-55. (Sun Qi, Ren Huiling. The Characteristics of Library Discovery System and Analysis of the Existing Problems [J]. Research on Library Science, 2014(3): 51-55.)
[15] Remove Duplicates Enrichment [EB/OL]. [2014-05-05]. https:// developers.exlibrisgroup.com/primo/integrations/bo/removedupenrichment.
[16] 孙翌, 李芳. 基于Primo的一站式资源获取平台实践与思考[J]. 图书馆学研究, 2012(16): 23-28. (Sun Yi, Li Fang. The Practice of One-stop Digital Resource Service Platform Based on Primo [J]. Research on Library Science, 2012(16): 23-28.)
[17] Duplicate Detection and Resolution [EB/OL]. [2014-05-27]. http://oclc.org/services/metadata/quality/ddr.en.html.
[18] FRBR Work-Set Algorithm[EB/OL].[2014-05-27]. http://www. oclc.org/research/activities/frbralgorithm.html?urlm=159780.
[19] Gatenby J, Greene R O, Oskins W M, et al. GLIMIR: Manifestation and Content Clustering Within WorldCat [OL]. (2012-06-01). [2014-05-27]. http://journal.code4lib.org/articles/ 6812.
[20] A Practical Approach to Bibliographic De-duplication [EB/OL]. [2011-09-15]. http://www.roganhamby.com/evergreen/ 2011/9/15/a-practical-approach-to-bibliographic-de-duplication.html.
[21] WorldCat Quality [R/OL]. [2011-08-25]. https://oclc.org/reports/ worldcatquality.en.html.
[22] EndNote [CP/OL]. [2014-05-05]. http://endnote.com/.
[23] Thomson Data Analyzer[CP/OL].[2014-05-05]. http://www. thomsonscientific.com.cn/productsservices/TDA/.
[24] Han J, Kamber M, Pei J. 数据挖掘: 概念与技术[M]. 范明, 孟小峰译. 3版. 北京: 机械工业出版社, 2012: 51. (Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques[M]. Translated by Fan Ming, Meng Xiaofeng. The 3rd Edition. Beijing: China Machine Press, 2012: 51.)
[25] SCI转换工具[CP/OL].[2013-03-05]. http://blog.sciencenet. cn/home.php?mod=space&uid=260374&do=blog&id=667402. (Format Transformation Software for SCI Data [CP/OL]. [2013-03-05]. http://blog.sciencenet.cn/home.php?mod=space &uid=260374&do=blog&id=667402.)
[26] EI转换工具[CP/OL]. [2013-03-05]. http://blog.sciencenet.cn/ home.php?mod=space&uid=260374&do=blog&id=667400. (Format Transformation Software for EI Data [CP/OL]. [2013-03-05]. http://blog.sciencenet.cn/home.php?mod=space&uid=260374&do=blog&id=667400.)

[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 刘文斌, 何彦青, 吴振峰, 董诚. 基于BERT和多相似度融合的句子对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[3] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[4] 谢豪,毛进,李纲. 基于多层语义融合的图文信息情感分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[5] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[6] 孟镇,王昊,虞为,邓三鸿,张宝隆. 基于特征融合的声乐分类研究*[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[7] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[8] 林克柔,王昊,龚丽娟,张宝隆. 融合多特征的中文论文同名学者消歧研究 *[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[9] 张鑫,文奕,许海云. 一种融合表示学习与主题表征的作者合作预测模型*[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[10] 李丹阳, 甘明鑫. 基于多源信息融合的音乐推荐方法 *[J]. 数据分析与知识发现, 2021, 5(2): 94-105.
[11] 韩普, 张伟, 张展鹏, 王宇欣, 方浩宇. 基于特征融合和多通道的突发公共卫生事件微博情感分析*[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[12] 华斌, 吴诺, 贺欣. 基于知识融合的政务信息化项目多专家审批意见整合*[J]. 数据分析与知识发现, 2021, 5(10): 124-136.
[13] 刘欢,张智雄,王宇飞. BERT模型的主要优化改进方法研究综述*[J]. 数据分析与知识发现, 2021, 5(1): 3-15.
[14] 李广建,王锴,张庆芝. 基于多源数据的美国出口管制分析框架及其实证研究*[J]. 数据分析与知识发现, 2020, 4(9): 26-40.
[15] 王鑫芸,王昊,邓三鸿,张宝隆. 面向期刊选择的学术论文内容分类研究 *[J]. 数据分析与知识发现, 2020, 4(7): 96-109.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn