Please wait a minute...
New Technology of Library and Information Service  2004, Vol. 20 Issue (8): 61-65    DOI: 10.11925/infotech.1003-3513.2004.08.16
Current Issue | Archive | Adv Search |
Improve the d-gaps Technique for Inverted File Compression: Document Identifier Reassignment
Zhang Aihong
(Department of Information Management, Sichuan University, Chengdu 610064,China)
Download: PDF (0 KB)  
Export: BibTeX | EndNote (RIS)      
Abstract  

The inverted file is the most popular indexing mechanism in an information retrieval system. Compressing an inverted file can greatly improve document search rate and save disk space. Traditionally, the d-gaps technique is used in the inverted file compression by replacing document identifiers with usually much smaller gap values. However, fluctuating gap values cannot be efficiently compressed by some well-known prefix-free codes. In this paper, a document identifier reassignment algorithm is proposed to reduce the gap values. Simulation results show that the average gap values of the inverted files can be reduced effectively.

Key wordsInverted file      d-gaps      Document identifier reassignment      TSP (Traveling Salesman Problem)      The greedy algorithm       MaxST     
Received: 08 March 2004      Published: 25 August 2004
ZTFLH: 

G353.2

 
Corresponding Authors: Zhang Aihong     E-mail: qianchianghong@163.com
About author:: Zhang Aihong

Cite this article:

Zhang Aihong. Improve the d-gaps Technique for Inverted File Compression: Document Identifier Reassignment. New Technology of Library and Information Service, 2004, 20(8): 61-65.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2004.08.16     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2004/V20/I8/61

1  Andrea Garratt,Mike Jackson,Peter Burden,Jon Wallis. A survey of alternative designs for a search engine storage structure: Information and Sostware Technology 43(2001) 661-677
2  Moffat, A.,Lang Stuiver. Binary interpolative Coding for Effective Index Compression: Information Retrieval 3(2000) 25-47
3  Navarro, Gonzalov. Adding compression to Block Addressing inverted Indexes:  Information Retrieval 2000(07) 49-77
4  Trotman, Andrew. Compressing Inverted Files: Information Retrieval;2003(01) 5-19
5  Moffat, A., & Zobel, J. (1996). Self-indexing inverted files for fast text retrieval: Information Syst., 14(4) 249-279
6  He X. Yesha Y. A nearly optimal parallel algorithm for constructing depth first spanning trees in planar graphs: SIAM J Comput., 1988, 174-185
7  Djidjev H V,Pantziou G E,Zaroliagis C D. Computing shortest paths and distances in planar graphs. In. FOCS' 87(1987) 238-248
8  Gregory Gutin, Anders Yeo, Alexey Zverovich. Traveling salesman should not be greedy: domination analysis of greedy-type heuristics for the TSP. Discrete Applied Mathematics 117(2002) 81-86
9  A.I. Serdjukov. On finding a maximum spanning tree of bounded radius: Discrete Applied Mathematics 114(2001) 249-253
10  Lawler, E.L., Lenstra, J.K., Rinnooy, A.H.G., & Shmoys, D.B. The Traveling Salesman Problem—A Guided Tour of Combinatorial Optimization. New York: Wiley-Interscience Publication.
11  陈国良,梁维发,沈鸿.并行图论算法研究进展.计算机研究与发展,1995(9)
12  潘立登,黄晓峰.用启发式贪心法求解旅行商问题.北京化工大学学报,1998(2)
13  姚朝灼.顶点覆盖问题的贪心算法的设计与分析.福州大学学报,2001(1)
14  段禅伦,斯勤夫.关于旅行推销员问题的一个算法.内蒙古大学学报,2001(6)
15  姜洪溪,陈丹.利用找环去边法求最小生成树的算法探析.襄樊学院学报,2002(9)
16  陈建二,王伟平,张祖平.关于实际构造最大带宽路径算法的研究.计算机学报,2002(10)

[1] Zhang Yanguo,Ye Feng. The Study and Realization of Dictionary Software s Full-text Retrieval[J]. 现代图书情报技术, 2004, 20(4): 37-39.
[2] Xiang Guilin,Liu Jinhua. Research and Completion on Dynamic Index Technology in Full Text Retrieval[J]. 现代图书情报技术, 2003, 19(3): 51-54.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn