ACM Home Page
Please provide us with feedback. Feedback
Parameterised compression for sparse bitmaps
Full text PdfPdf (1.00 MB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Copenhagen, Denmark
Pages: 274 - 285  
Year of Publication: 1992
ISBN:0-89791-523-2
Authors
Alistair Moffat  Department of Computer Science, The University of Melbourne, Parkville 3052, Australia
Justin Zobel  Department of Computer Science, Royal Melbourne Institute of Technology, GPO Box 2476V, Melbourne 3001, Australia
Sponsors
Royal School of Lib. : Royal School of Lib.
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 52,   Citation Count: 21
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/133160.133210
What is a DOI?

ABSTRACT

Full-text retrieval systems often use either a bitmap or an inverted file to identify which documents contain which terms, so that the documents containing any combination of query terms can be quickly located. Bitmaps of term occurrences are large, but are usually sparse, and thus are amenable to a variety of compression techniques. Here we consider techniques in which the encoding of each bitvector within the bitmap is parameterised, so that a different code can be used for each bitvector. Our experimental results show that the new methods yield better compression than previous techniques.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
A. Bookstein and S.T. Klein. Flexible compression for bitmap sets. In j.A. Storer and J.H. Reif, editors~ Proc. IEEE Data Compression Conference, pages 402-410, Snowbird, Utah, April 1991.
3
 
4
A. Bookstein, 8.T. Klein, and T. Raita. Model based concordance compression. In J.A. Storer and J.H. Reif, editors, Proc. IEEE Data Compression Conference, Snowbird, Utah, March 1992. To appear.
5
6
7
 
8
P. Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, IT-21:194-203, March 1975.
 
9
A.S. Fraenkel and $.T. Klein. Novel compression of sparse bit-strings~Preliminary report. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, Volume 12, NATO ASI Series F, pages 169-183, Berlin, 1985. Springer-Verlag.
 
10
R.G. Gallager and D.C. Van Voorhis. OptimM source codes for geometrically distributed alphabets. IEEE Transactions on Information Theory, IT-21(2):228-230, March 1975.
 
11
S.W. Golomb. Run-length encodings. IEEE Transactions on Information Theory, IT- 12(3):399-401, July 1966.
 
12
D.A. Huffman. A method for the construction of minimum redundancy codes. Proc. IRE, 40(9):1098-1101, September 1952.
 
13
M. Jakobsson. Huffman coding in bit-vector compression. Information Processing Letters, 7(6):304-307, October 1978.
 
14
M. Lesk. Some applications of inverted indexes on the Unix system. In Unix Programmers Manual, Volume 214. Bell Laboratories, Murray Hill, New Jersey, 1988.
 
15
M.D. Mcilroy. Development of a spelling fist. IEEE Transactions on Communications, COM-30(1):91-99, January 1982.
 
16
A.M. Moffat. Economical inversion of large text files. Computing Systems, 5(2), 1992. To appear.
 
17
A.M. Moffat and J. Zobel. Coding for compression in full-text retrieval systems. In J.A. Storer and J.H. Reif, editors, Proc. IEEE Data Compression Conference, Snowbird, Utah, March 1992. To appear.
 
18
 
19
E.J. Schuegraf. Compression of large inverted files with hyperbolic term distribution. Information Processing ~Z Management, 12:377- 384, 1976.
 
20
Z. Somogyi. The Melbourne University bibliography system. Technical Report 90/3, Department of Computer Science, The University of Melbourne, Parkville, Victoria 3052, Australia, March 1990.
 
21
J. Teuhola. A compression method for clustered bit-vectors. Information Processing Letters, 7(6):308-311, October 1978.
 
22
B. Tuthill. Refer~a bibliography system. In Unix User's Manual Supplementary Documents. 4.2 Berkeley Software Distribution, Berkeley, California, 1984.
 
23
I.H. Witten, T.C. Bell, and C. Nevill. Models for compression in full-text retrieval systems. In J.A. Storer and J.H. Reif, editors, Proc. IEEE Data Compression Conference, pages 23-32, Snowbird, Utah, April 1991.
24
 
25
J. Zobel and A.M. Moffat. Adding compression to a full-text retrieval system. In Proc. 15'th Australian Computer Science Conference, pages 1077-1089, Hobart, January 1992. University of Tasmania.
 
26
J. Zobel, A.M. Moffat, and It. Sacks-Davis. An efficient indexing technique for full-text database systems. Technical Report 92/21, Collaborative Information Technology Research Institute, Department of Computer Science, Royal Melbourne Institute of Technology, Australia, February 1992.

CITED BY  21

Collaborative Colleagues:
Alistair Moffat: colleagues
Justin Zobel: colleagues