| Improved hierarchical bit-vector compression in document retrieval systems |
| Full text |
Pdf
(826 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Palazzo dei Congressi, Pisa, Italy
Pages: 88 - 96
Year of Publication: 1986
ISBN:0-89791-187-3
|
|
Authors
|
|
A. S. Fraenkel
|
Department of Applied Mathematics, The Weismann Institute of Science, Rehovot 76100, Israel
|
|
S. T. Klein
|
Department of Applied Mathematics, The Weismann Institute of Science, Rehovot 76100, Israel
|
|
Y. Choueka
|
Inst. for Information Retrieval and Computational Linguistics (IRCOL) - The Responsa Project and Department of Mathematics and Computer Science, Bar-Ilan University, Ramat Gan, Israel
|
|
E. Segal
|
Inst. for Information Retrieval and Computational Linguistics (IRCOL) - The Responsa Project
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 52, Citation Count: 12
|
|
|
ABSTRACT
The “concordance” of an information retrieval system can often be stored in form of bit-maps, which are usually very sparse and should be compressed. Hierarchical bit-vector compression consists of partitioning a vector vi into equi-sized blocks, constructing a new bit-vector vi+1 which points to the non-zero blocks in vi, dropping the zero-blocks of vi, and repeating the process for vi+1. We refine the method by pruning some of the tree branches if they ultimately point to very few documents; these document numbers are then added to an appended list which is compressed by the prefix-omission technique. The new method was thoroughly tested on the bit-maps of the Responsa Retrieval Project, and gave a relative improvement of about 40% over the conventional hierarchical compression method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Choueka Y., Full text systems and Research in the Humanities, Computer8 and the Humanities X_IV (1980) 153-169.
|
 |
2
|
|
| |
3
|
Fraenkel A.S., All about the Responsa Retrieval Project you always wanted to know but were afraid to ask, Expanded Summary, Jurimetric8 J. 16 (1976) 149- 156.
|
| |
4
|
Fraenkel A.S., Klein S.T., Novel Compression of sparse Bit-Strings-- Preliminaxy Report, Combinatorial Algorithnts on Wo#ds, NATO ASI Series Vol F12, Springer Verlag, Berlin (1985) 169-183.
|
| |
5
|
Jakobsson M., Huffman coding in Bit- Vector Compression, Inf. Proc. Letters T (1978) 304-307.
|
| |
6
|
J akobsson M., Evaluation of s Hierarchical Bit-Vector Compression Technique, Inf. Proc. Letters 14 (1982) 147-149.
|
| |
7
|
|
| |
8
|
Nevalainen O., Jakobsson M., Berg R., Compression of clustered inverted files, in Proc. of the 7-th Syrup. on Math. Foundations of Comp. Sc., Zakopane, Poland (1978) 393-402.
|
| |
9
|
Schuegraf E.J., Compression of large in-. verted files with hyperbolic term distribution, Inf. Proc. and Management 12 (1976) 377-384.
|
| |
10
|
Teuhola 3., A Compression method for Clustered Bit-Vectors, inf. Proc. Letter8 7 (1978) 308-311.
|
 |
11
|
|
| |
12
|
Wedekind H., H#rder T., Datenbank- 8#teme II, B.-I. Wissenschaftsverlag, Mannheim (1976).
|
CITED BY 12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Y. Choueka , A. Fraenkel , S. Klein , E. Segal, Improved techniques for processing queries in full-text systems, Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval, p.306-315, June 03-05, 1987, New Orleans, Louisiana, United States
|
|
|
A. Bookstein , S. T. Klein , T. Raita, Detecting content-bearing words by serial clustering—extended abstract, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.319-327, July 09-13, 1995, Seattle, Washington, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|