Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (7/8): 116-120    DOI: 10.11925/infotech.1003-3513.2011.07-08.19
Current Issue | Archive | Adv Search |
Research on Duplicated Literature Deletion Method Based on Cross-database Search
Hao Dan1, Zhou Jinhui1,2, Guan Bei2, Wang Yanxi2, Han Jixin3
1. School of Economics and Management, Xidian University, Xi'an 710071, China;
2. Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;
3. North China Electric Power Materical Company, Beijing 100075, China
Download: PDF(908 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  This paper takes the statistic on publications by authors and affiliations as the background.Special reasons that cause data redundancy in cross-database searching are analyzed, and four duplicate removal methods including Cross Chinese Database ID, Cross English Database ID, DOI and “Title & Type” are proposed and applied in literature statistics work effectively, which can better solve the cross-database redundancy problems between different databases.
Key wordsCross-database searching      Duplicate removal strategy      Literature information     
Received: 16 May 2011      Published: 09 October 2011



Cite this article:

Hao Dan, Zhou Jinhui, Guan Bei, Wang Yanxi, Han Jixin. Research on Duplicated Literature Deletion Method Based on Cross-database Search. New Technology of Library and Information Service, 2011, 27(7/8): 116-120.

URL:     OR

[1] 数字图书馆跨库检索技术研究.

[2] 孙君,张苏. 基于ISI与KNS服务平台的跨库检索比较[J]. 图书馆学研究, 2007(1): 64-66,45.

[3] 周津慧,王衍喜,王永吉,等. 基于领域专家学科知识链的文献资源组织与导航[J]. 科研信息化技术与应用, 2011,2(1): 33-42.

[4] 洪道广. Google Scholar 的数据整合研究[J]. 现代情报, 2010,30(7): 39-41,45.

[5] 王衍喜,周津慧,王永吉,等. 一种基于科技文献的学科团队识别方法研究[J]. 图书情报工作, 2011,55(2): 55-59.

[6] 张琪玉. 情报语言漫笔(I)[J]. 图书馆理论与实践, 2003(3): 37-39.

[7] 殷波. 网页去重技术[J]. 现代图书情报技术, 2008(z1): 71-75.

[8] 孔素然. 基于模糊匹配思想的网页去重算法 . 上海:复旦大学,2006.

[9] Ye S Z, Wen J R,Ma W Y. A Systematic Study on Parameter Correlations in Large-scale Duplicate Document Detection[J]. Knowledge and Information Systems, 2008, 14(2): 217-232.

[10] Agarwal A,Koppula H S,Leela K P, et al. URL Normalization for De-duplication of Web Pages .In:Proceeding of the 18th ACM Conference on Information and Knowledge Management. New York, NY, USA:ACM,2009: 1987-1990.

[11] 吴小惠. 分布式网络爬虫URL去重策略的改进[J]. 平顶山学院学报, 2009,24(5): 116-119.

[12] 王东,熊世桓. 基于拼音首字母查询的去重优化设计[J]. 贵州师范学院学报, 2010,26(6): 37-39.
[1] Lei Chunbing ,Zhang Xiaomei,Yan Shigang,Wang Guoqing,Chen Jianqing,Liu Jinyu,Du Yunxiang. Establishment of the Biomedical Bibliographic Database in[J]. 现代图书情报技术, 2005, 21(8): 54-57.
[2] Hua Fang,Lin Saihua,Sun Ping. Design of the Question Management Database by VFP and the Question  Model for Course Literature Information Retrieval and Utilization[J]. 现代图书情报技术, 2005, 21(1): 64-66.
[3] Wang Baoji,Liu Qingshui,Qing Jinguang, Lu Jia,Yang Wenli,Chen Guomin. Study and Practice on Agricultural Engineering Digital Library Based on TPI[J]. 现代图书情报技术, 2003, 19(5): 11-14.
[4] Guo Yiqun. Literature Resource Center and Web-based Integration on Literature Information and Services[J]. 现代图书情报技术, 2000, 16(4): 48-50.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938