Please wait a minute...
Advanced Search
现代图书情报技术  2004, Vol. 20 Issue (8): 61-65     https://doi.org/10.11925/infotech.1003-3513.2004.08.16
  网络资源与建设 本期目录 | 过刊浏览 | 高级检索 |
倒排文档压缩技巧d-gaps的改进——文档标识号重置法
张爱红
(四川大学信息管理系  成都  610064)
Improve the d-gaps Technique for Inverted File Compression: Document Identifier Reassignment
Zhang Aihong
(Department of Information Management, Sichuan University, Chengdu 610064,China)
全文:
输出: BibTeX | EndNote (RIS)      
摘要 

倒排文档是信息检索系统中最普遍使用的索引机制,而索引文件的压缩能大大提高检索速度和节约磁盘空间。倒排文件压缩的传统做法是文档(标识号)间距法(d-gaps)。然而,剧烈变化的间距值并不能被著名的前缀自由代码有效编码压缩。为了使间距值得到有效的压缩,本文设计了一个文档标识号重置法。模拟试验表明能更有效压缩d-gaps倒排文档。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
关键词 倒排文档文档(标识号)间距法文档标识号重置法旅行商问题贪心算法最大生成树    
Abstract

The inverted file is the most popular indexing mechanism in an information retrieval system. Compressing an inverted file can greatly improve document search rate and save disk space. Traditionally, the d-gaps technique is used in the inverted file compression by replacing document identifiers with usually much smaller gap values. However, fluctuating gap values cannot be efficiently compressed by some well-known prefix-free codes. In this paper, a document identifier reassignment algorithm is proposed to reduce the gap values. Simulation results show that the average gap values of the inverted files can be reduced effectively.

Key wordsInverted file    d-gaps    Document identifier reassignment    TSP (Traveling Salesman Problem)    The greedy algorithm     MaxST
收稿日期: 2004-03-08      出版日期: 2004-08-25
: 

G353.2

 
通讯作者: 张爱红     E-mail: qianchianghong@163.com
作者简介: 张爱红
引用本文:   
张爱红. 倒排文档压缩技巧d-gaps的改进——文档标识号重置法[J]. 现代图书情报技术, 2004, 20(8): 61-65.
Zhang Aihong. Improve the d-gaps Technique for Inverted File Compression: Document Identifier Reassignment. New Technology of Library and Information Service, 2004, 20(8): 61-65.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2004.08.16      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2004/V20/I8/61

1  Andrea Garratt,Mike Jackson,Peter Burden,Jon Wallis. A survey of alternative designs for a search engine storage structure: Information and Sostware Technology 43(2001) 661-677
2  Moffat, A.,Lang Stuiver. Binary interpolative Coding for Effective Index Compression: Information Retrieval 3(2000) 25-47
3  Navarro, Gonzalov. Adding compression to Block Addressing inverted Indexes:  Information Retrieval 2000(07) 49-77
4  Trotman, Andrew. Compressing Inverted Files: Information Retrieval;2003(01) 5-19
5  Moffat, A., & Zobel, J. (1996). Self-indexing inverted files for fast text retrieval: Information Syst., 14(4) 249-279
6  He X. Yesha Y. A nearly optimal parallel algorithm for constructing depth first spanning trees in planar graphs: SIAM J Comput., 1988, 174-185
7  Djidjev H V,Pantziou G E,Zaroliagis C D. Computing shortest paths and distances in planar graphs. In. FOCS' 87(1987) 238-248
8  Gregory Gutin, Anders Yeo, Alexey Zverovich. Traveling salesman should not be greedy: domination analysis of greedy-type heuristics for the TSP. Discrete Applied Mathematics 117(2002) 81-86
9  A.I. Serdjukov. On finding a maximum spanning tree of bounded radius: Discrete Applied Mathematics 114(2001) 249-253
10  Lawler, E.L., Lenstra, J.K., Rinnooy, A.H.G., & Shmoys, D.B. The Traveling Salesman Problem—A Guided Tour of Combinatorial Optimization. New York: Wiley-Interscience Publication.
11  陈国良,梁维发,沈鸿.并行图论算法研究进展.计算机研究与发展,1995(9)
12  潘立登,黄晓峰.用启发式贪心法求解旅行商问题.北京化工大学学报,1998(2)
13  姚朝灼.顶点覆盖问题的贪心算法的设计与分析.福州大学学报,2001(1)
14  段禅伦,斯勤夫.关于旅行推销员问题的一个算法.内蒙古大学学报,2001(6)
15  姜洪溪,陈丹.利用找环去边法求最小生成树的算法探析.襄樊学院学报,2002(9)
16  陈建二,王伟平,张祖平.关于实际构造最大带宽路径算法的研究.计算机学报,2002(10)

[1] 向桂林,刘锦华. 全文检索系统中动态索引技术的研究与实现[J]. 现代图书情报技术, 2003, 19(3): 51-54.
[2] 邓汉成. 福岛~+──一种改进的福岛方法[J]. 现代图书情报技术, 1994, 10(3): 40-44.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn