|
ABSTRACT
Full-text retrieval systems often use either a bitmap or an inverted file to identify which documents contain which terms, so that the documents containing any combination of query terms can be quickly located. Bitmaps of term occurrences are large, but are usually sparse, and thus are amenable to a variety of compression techniques. Here we consider techniques in which the encoding of each bitvector within the bitmap is parameterised, so that a different code can be used for each bitvector. Our experimental results show that the new methods yield better compression than previous techniques.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
A. Bookstein and S.T. Klein. Flexible compression for bitmap sets. In j.A. Storer and J.H. Reif, editors~ Proc. IEEE Data Compression Conference, pages 402-410, Snowbird, Utah, April 1991.
|
 |
3
|
|
| |
4
|
A. Bookstein, 8.T. Klein, and T. Raita. Model based concordance compression. In J.A. Storer and J.H. Reif, editors, Proc. IEEE Data Compression Conference, Snowbird, Utah, March 1992. To appear.
|
 |
5
|
|
 |
6
|
A. S. Fraenkel , S. T. Klein , Y. Choueka , E. Segal, Improved hierarchical bit-vector compression in document retrieval systems, Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval, p.88-96, September 1986, Palazzo dei Congressi, Pisa, Italy
[doi> 10.1145/253168.253190]
|
 |
7
|
|
| |
8
|
P. Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, IT-21:194-203, March 1975.
|
| |
9
|
A.S. Fraenkel and $.T. Klein. Novel compression of sparse bit-strings~Preliminary report. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, Volume 12, NATO ASI Series F, pages 169-183, Berlin, 1985. Springer-Verlag.
|
| |
10
|
R.G. Gallager and D.C. Van Voorhis. OptimM source codes for geometrically distributed alphabets. IEEE Transactions on Information Theory, IT-21(2):228-230, March 1975.
|
| |
11
|
S.W. Golomb. Run-length encodings. IEEE Transactions on Information Theory, IT- 12(3):399-401, July 1966.
|
| |
12
|
D.A. Huffman. A method for the construction of minimum redundancy codes. Proc. IRE, 40(9):1098-1101, September 1952.
|
| |
13
|
M. Jakobsson. Huffman coding in bit-vector compression. Information Processing Letters, 7(6):304-307, October 1978.
|
| |
14
|
M. Lesk. Some applications of inverted indexes on the Unix system. In Unix Programmers Manual, Volume 214. Bell Laboratories, Murray Hill, New Jersey, 1988.
|
| |
15
|
M.D. Mcilroy. Development of a spelling fist. IEEE Transactions on Communications, COM-30(1):91-99, January 1982.
|
| |
16
|
A.M. Moffat. Economical inversion of large text files. Computing Systems, 5(2), 1992. To appear.
|
| |
17
|
A.M. Moffat and J. Zobel. Coding for compression in full-text retrieval systems. In J.A. Storer and J.H. Reif, editors, Proc. IEEE Data Compression Conference, Snowbird, Utah, March 1992. To appear.
|
| |
18
|
|
| |
19
|
E.J. Schuegraf. Compression of large inverted files with hyperbolic term distribution. Information Processing ~Z Management, 12:377- 384, 1976.
|
| |
20
|
Z. Somogyi. The Melbourne University bibliography system. Technical Report 90/3, Department of Computer Science, The University of Melbourne, Parkville, Victoria 3052, Australia, March 1990.
|
| |
21
|
J. Teuhola. A compression method for clustered bit-vectors. Information Processing Letters, 7(6):308-311, October 1978.
|
| |
22
|
B. Tuthill. Refer~a bibliography system. In Unix User's Manual Supplementary Documents. 4.2 Berkeley Software Distribution, Berkeley, California, 1984.
|
| |
23
|
I.H. Witten, T.C. Bell, and C. Nevill. Models for compression in full-text retrieval systems. In J.A. Storer and J.H. Reif, editors, Proc. IEEE Data Compression Conference, pages 23-32, Snowbird, Utah, April 1991.
|
 |
24
|
|
| |
25
|
J. Zobel and A.M. Moffat. Adding compression to a full-text retrieval system. In Proc. 15'th Australian Computer Science Conference, pages 1077-1089, Hobart, January 1992. University of Tasmania.
|
| |
26
|
J. Zobel, A.M. Moffat, and It. Sacks-Davis. An efficient indexing technique for full-text database systems. Technical Report 92/21, Collaborative Information Technology Research Institute, Department of Computer Science, Royal Melbourne Institute of Technology, Australia, February 1992.
|
CITED BY 21
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David Johnson , Shankar Krishnan , Jatin Chhugani , Subodh Kumar , Suresh Venkatasubramanian, Compressing large boolean matrices using reordering techniques, Proceedings of the Thirtieth international conference on Very large data bases, p.13-23, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|