New Technology of Library and Information Service  2004, Vol. 20 Issue (8): 61-65    DOI: 10.11925/infotech.1003-3513.2004.08.16
Improve the d-gaps Technique for Inverted File Compression: Document Identifier Reassignment
Zhang Aihong
(Department of Information Management, Sichuan University, Chengdu 610064,China)
The inverted file is the most popular indexing mechanism in an information retrieval system. Compressing an inverted file can greatly improve document search rate and save disk space. Traditionally, the d-gaps technique is used in the inverted file compression by replacing document identifiers with usually much smaller gap values. However, fluctuating gap values cannot be efficiently compressed by some well-known prefix-free codes. In this paper, a document identifier reassignment algorithm is proposed to reduce the gap values. Simulation results show that the average gap values of the inverted files can be reduced effectively.

Key wordsInverted file      d-gaps      Document identifier reassignment      TSP (Traveling Salesman Problem)      The greedy algorithm       MaxST     
Received: 08 March 2004      Published: 25 August 2004


Corresponding Authors: Zhang Aihong     E-mail:
About author:: Zhang Aihong

Cite this article:

Zhang Aihong. Improve the d-gaps Technique for Inverted File Compression: Document Identifier Reassignment. New Technology of Library and Information Service, 2004, 20(8): 61-65.

