New Technology of Library and Information Service  2011, Vol. 27 Issue (7/8): 116-120    DOI: 10.11925/infotech.1003-3513.2011.07-08.19
Research on Duplicated Literature Deletion Method Based on Cross-database Search
Hao Dan1, Zhou Jinhui1,2, Guan Bei2, Wang Yanxi2, Han Jixin3
1. School of Economics and Management, Xidian University, Xi'an 710071, China;
2. Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;
3. North China Electric Power Materical Company, Beijing 100075, China
Abstract  This paper takes the statistic on publications by authors and affiliations as the background.Special reasons that cause data redundancy in cross-database searching are analyzed, and four duplicate removal methods including Cross Chinese Database ID, Cross English Database ID, DOI and “Title & Type” are proposed and applied in literature statistics work effectively, which can better solve the cross-database redundancy problems between different databases.
Key wordsCross-database searching      Duplicate removal strategy      Literature information     
Received: 16 May 2011      Published: 09 October 2011



Hao Dan, Zhou Jinhui, Guan Bei, Wang Yanxi, Han Jixin. Research on Duplicated Literature Deletion Method Based on Cross-database Search. New Technology of Library and Information Service, 2011, 27(7/8): 116-120.

[1] 数字图书馆跨库检索技术研究.

[2] 孙君,张苏. 基于ISI与KNS服务平台的跨库检索比较[J]. 图书馆学研究, 2007(1): 64-66,45.

[3] 周津慧,王衍喜,王永吉,等. 基于领域专家学科知识链的文献资源组织与导航[J]. 科研信息化技术与应用, 2011,2(1): 33-42.

[4] 洪道广. Google Scholar 的数据整合研究[J]. 现代情报, 2010,30(7): 39-41,45.

[5] 王衍喜,周津慧,王永吉,等. 一种基于科技文献的学科团队识别方法研究[J]. 图书情报工作, 2011,55(2): 55-59.

[6] 张琪玉. 情报语言漫笔(I)[J]. 图书馆理论与实践, 2003(3): 37-39.

[7] 殷波. 网页去重技术[J]. 现代图书情报技术, 2008(z1): 71-75.

[8] 孔素然. 基于模糊匹配思想的网页去重算法 . 上海:复旦大学,2006.

[9] Ye S Z, Wen J R,Ma W Y. A Systematic Study on Parameter Correlations in Large-scale Duplicate Document Detection[J]. Knowledge and Information Systems, 2008, 14(2): 217-232.

[10] Agarwal A,Koppula H S,Leela K P, et al. URL Normalization for De-duplication of Web Pages .In:Proceeding of the 18th ACM Conference on Information and Knowledge Management. New York, NY, USA:ACM,2009: 1987-1990.

[11] 吴小惠. 分布式网络爬虫URL去重策略的改进[J]. 平顶山学院学报, 2009,24(5): 116-119.

[12] 王东,熊世桓. 基于拼音首字母查询的去重优化设计[J]. 贵州师范学院学报, 2010,26(6): 37-39.
[1] Lei Chunbing ,Zhang Xiaomei,Yan Shigang,Wang Guoqing,Chen Jianqing,Liu Jinyu,Du Yunxiang. Establishment of the Biomedical Bibliographic Database in[J]. 现代图书情报技术, 2005, 21(8): 54-57.
[2] Hua Fang,Lin Saihua,Sun Ping. Design of the Question Management Database by VFP and the Question  Model for Course Literature Information Retrieval and Utilization[J]. 现代图书情报技术, 2005, 21(1): 64-66.
[3] Wang Baoji,Liu Qingshui,Qing Jinguang, Lu Jia,Yang Wenli,Chen Guomin. Study and Practice on Agricultural Engineering Digital Library Based on TPI[J]. 现代图书情报技术, 2003, 19(5): 11-14.
[4] Guo Yiqun. Literature Resource Center and Web-based Integration on Literature Information and Services[J]. 现代图书情报技术, 2000, 16(4): 48-50.
