Research on Duplicated Literature Deletion Method Based on Cross-database Search
Hao Dan1, Zhou Jinhui1,2, Guan Bei2, Wang Yanxi2, Han Jixin3
1. School of Economics and Management, Xidian University, Xi'an 710071, China;
2. Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;
3. North China Electric Power Materical Company, Beijing 100075, China
Abstract:This paper takes the statistic on publications by authors and affiliations as the background.Special reasons that cause data redundancy in cross-database searching are analyzed, and four duplicate removal methods including Cross Chinese Database ID, Cross English Database ID, DOI and “Title & Type” are proposed and applied in literature statistics work effectively, which can better solve the cross-database redundancy problems between different databases.
郝丹, 周津慧, 关贝, 王衍喜, 韩继欣. 文献跨库检索中去重方法研究与应用[J]. 现代图书情报技术, 2011, 27(7/8): 116-120.
Hao Dan, Zhou Jinhui, Guan Bei, Wang Yanxi, Han Jixin. Research on Duplicated Literature Deletion Method Based on Cross-database Search. New Technology of Library and Information Service, 2011, 27(7/8): 116-120.
[1] 数字图书馆跨库检索技术研究.http://www.cnitblog.com/zhangyu/archive/2007/02/28/23427.html.[2] 孙君,张苏. 基于ISI与KNS服务平台的跨库检索比较[J]. 图书馆学研究, 2007(1): 64-66,45.[3] 周津慧,王衍喜,王永吉,等. 基于领域专家学科知识链的文献资源组织与导航[J]. 科研信息化技术与应用, 2011,2(1): 33-42.[4] 洪道广. Google Scholar 的数据整合研究[J]. 现代情报, 2010,30(7): 39-41,45.[5] 王衍喜,周津慧,王永吉,等. 一种基于科技文献的学科团队识别方法研究[J]. 图书情报工作, 2011,55(2): 55-59.[6] 张琪玉. 情报语言漫笔(I)[J]. 图书馆理论与实践, 2003(3): 37-39.[7] 殷波. 网页去重技术[J]. 现代图书情报技术, 2008(z1): 71-75.[8] 孔素然. 基于模糊匹配思想的网页去重算法 . 上海:复旦大学,2006.[9] Ye S Z, Wen J R,Ma W Y. A Systematic Study on Parameter Correlations in Large-scale Duplicate Document Detection[J]. Knowledge and Information Systems, 2008, 14(2): 217-232.[10] Agarwal A,Koppula H S,Leela K P, et al. URL Normalization for De-duplication of Web Pages .In:Proceeding of the 18th ACM Conference on Information and Knowledge Management. New York, NY, USA:ACM,2009: 1987-1990.[11] 吴小惠. 分布式网络爬虫URL去重策略的改进[J]. 平顶山学院学报, 2009,24(5): 116-119.[12] 王东,熊世桓. 基于拼音首字母查询的去重优化设计[J]. 贵州师范学院学报, 2010,26(6): 37-39.