ACM Home Page
Please provide us with feedback. Feedback
Inverted files for text search engines
Full text PdfPdf (944 KB)
Source ACM Computing Surveys (CSUR) archive
Volume 38 ,  Issue 2  (2006) table of contents
Article No. 6  
Year of Publication: 2006
ISSN:0360-0300
Authors
Justin Zobel  RMIT University, Australia
Alistair Moffat  The University of Melbourne, Australia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 170,   Downloads (12 Months): 1474,   Citation Count: 35
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1132956.1132959
What is a DOI?

ABSTRACT

The technology underlying text search engines has advanced dramatically in the past decade. The development of a family of new index representations has led to a wide range of innovations in index storage, index construction, and query evaluation. While some of these developments have been consolidated in textbooks, many specific techniques are not widely known or the textbook descriptions are out of date. In this tutorial, we introduce the key techniques in the area, describing both a core implementation and how the core can be enhanced through a range of extensions. We conclude with a comprehensive bibliography of text indexing literature.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
5
 
6
Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., and Ziviani, N. 2001. Distributed query processing using partitioned inverted files. In Proceedings of String Processing and Information Retrieval Symposium, Laguna de San Rafael, Chile. G. Navarro, Ed. IEEE Computer Society, 10--20.
 
7
 
8
 
9
 
10
 
11
 
12
Bayer, R. and McCreight, R. 1972. Organization and maintenance of large ordered indexes. Acta Informatica 1, 173--189.
 
13
Beaulieu, M., Baeza-Yates, R., Myaeng, S. H., and Järvelin, K., Eds. 2002. Proceedings of the 25th Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval. Tampere, Finland, ACM Press.
 
14
 
15
 
16
17
18
 
19
 
20
 
21
 
22
Bookstein, A. and Klein, S. T. 1991b. Flexible compression for bitmap sets. In Proceedings of the IEEE Data Compression Conference, Snowbird, UT, J. Storer and M. Cohn, Eds. IEEE Computer Society Press, Los Alamitos, CA. 402--410.
23
 
24
 
25
 
26
 
27
Brisaboa, N. R., Fariña, A., Navarro, G., and Esteller, M. F. 2003. (S,C)-dense coding: An optimized compression code for natural language text databases. In Proceedings of String Processing and Information Retrieval Symposium, Manaus, Brazil, M. A. Nascimento, Ed. Springer, 122--136.
28
 
29
 
30
31
32
33
 
34
Cacheda, F., Plachouras, V., and Ounis, I. 2004. Performance analysis of distributed architectures to index one terabyte of text. In Proceedings of the European Conference on IR Research, Sunderland, UK, S. McDonald and J. Tait, Eds. 395--408. Lecture Notes in Computer Science, Springer, vol. 2997.
35
36
 
37
38
39
40
41
42
43
44
 
45
Clarke, C. L. A. and Cormack, G. V. 1995. Dynamic inverted indexes for a distributed full-text retrieval system. Tech. rep. MT-95-01, Department of Computer Science, University of Waterloo, Waterloo, Canada.
46
 
47
Clarke, C. L. A., Cormack, G. V., and Burkowski, F. J. 1994. Fast inverted indexes with on-line update. Tech. rep. CS-94-40, Department of Computer Science, University of Waterloo, Waterloo, Canada.
 
48
 
49
50
 
51
Croft, W. B., Harper, D. J., Kraft, D. H., and Zobel, J., Eds. 2001. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. ACM Press.
 
52
53
54
 
55
Culpepper, J. S. and Moffat, A. 2005. Enhanced byte codes with restricted prefix properties. In Proceedings of the 12th International Symposium on String Processing and Information Retrieval. Buenos Aires, Argentina, M. P. Consens and G. Navarro, Eds. Lecture Notes in Computer Science, vol. 3772, Springer, 1--12.
56
57
 
58
Owen de Kretser , Alistair Moffat, SEFT: a search engine for text, Software—Practice & Experience, v.34 n.10, p.1011-1023, August 2004
 
59
60
61
 
62
Elias, P. 1975. Universal codeword sets and representations of the integers. IEEE Trans. Inform. Theory IT-21, 2 (March), 194--203.
63
64
 
65
 
66
 
67
 
68
Fraenkel, A. S. and Klein, S. T. 1985. Novel compression of sparse bit-strings---Preliminary report. In Combinatorial Algorithms on Words, Volume 12, A. Apostolico and Z. Galil, Eds. NATO ASI Series F. Springer, Berlin, Germany, 169--183.
 
69
70
 
71
Gallager, R. G. and Van Voorhis, D. C. 1975. Optimal source codes for geometrically distributed integer alphabets. IEEE Trans. Inform. Theory IT--21, 2 (March), 228--230.
 
72
 
73
Golomb, S. W. 1966. Run-length encodings. IEEE Trans. Inform. Theory IT--12, 3 (July), 399--401.
74
 
75
 
76
 
77
 
78
Harman, D. K. and Candela, G. 1990. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. J. Amer. Soc. Inform. Science 41, 8 (Aug.), 581--589.
 
79
Harper, D. J. 1982. An evaluation of four information storage and retrieval packages. Tech. rep. 7, CSIRO Division of Computing Research, Canberra, Australia.
80
 
81
82
 
83
84
 
85
 
86
Ivie, E. L. 1966. Search procedure based on measures of relatedness between documents. Ph.D. thesis. MIT, Cambridge, MA.
 
87
Jakobsson, M. 1978. Huffman coding in bit-vector compression. Inform. Pro. Let. 7, 6 (Oct.) 304--307.
 
88
89
90
 
91
Kent, A. J., Sacks-Davis, R., and Ramamohanarao, K. 1990. A signature file scheme based on multiple organisations for indexing very large text databases. J. Amer. Soc. Inform. Science 41, 7, 508--534.
92
93
94
 
95
 
96
Lancaster, F. W. and Fayen, E. G. 1973. Information Retrieval OnLine. Melville, Los Angeles, CA.
 
97
98
99
100
101
 
102
 
103
Lester, N., Moffat, A., Webber, W., and Zobel, J. 2005. Space-limited ranked query evaluation using adaptive pruning. In Proceedings of the 6th International Conference on Web Informations Systems. Lecture Notes in Computer Science, vol. 3806, Springer, 470--477.
104
 
105
Lester, N., Zobel, J., and Williams, H. E. 2006. Efficient online index maintenance for text retrieval systems. Inform. Proce. Manag. To appear.
106
107
 
108
 
109
 
110
Luhn, H. P. 1957. A statistical approach to mechanised encoding and searching of library information. IBM J. Resear. Develop. 1, 309--317.
 
111
 
112
 
113
Manber, U. and Wu, S. 1994. GLIMPSE: a tool to search through entire file systems. In USENIX Winter Technical Conference. San Francisco CA. USENIX Association, Berkeley, CA. 23--32.
 
114
115
 
116
 
117
 
118
Matthew, F. W. and Thomson, L. 1967. Weighted term search: a computer program for an inverted coordinate search on magnetic tape. J. Chem. Document. 7, 1, 49--56.
 
119
McDonell, K. J. 1977. An inverted index implementation. Comput. J. 20, 1, 116--123.
120
 
121
Moffat, A. 1992. Economical inversion of large text files. Comput. Syst. 5, 2, 125--139.
 
122
 
123
 
124
 
125
Moffat, A., Webber, W., Zobel, J., and Baeza-Yates, R. 2005. A pipelined architecture for distributed text query evaluation. Submitted for publication.
 
126
Moffat, A. and Zobel, J. 1992a. Coding for compression in full-text retrieval systems. In Proceedings of the IEEE Data Compression Conference, Snowbird, UT, J. A. Storer and M. Cohn, Eds. IEEE Computer Society Press, Los Alamitos, CA, 72--81.
127
128
 
129
Moffat, A. and Zobel, J. 2004. What does it mean to “measure performance”? In Proceedings of the 5th International Conference on Web Informations Systems, Brisbane, Australia. X. Zhou, S. Su, M. P. Papazoglou, M. E. Owlowska, and K. Jeffrey, Eds. Lecture Notes in Computer Science, vol. 3306. Springer, 1--12.
 
130
 
131
 
132
 
133
Noreault, T., Koll, M., and McGill, M. J. 1977. Automatic ranked output from Boolean searches in SIRE. J. Amer, Soc. Inform. Science 28, 333--339.
 
134
Perry, S. A. and Willett, P. 1983. A review of the use of inverted files for best match searching in information retrieval systems. J. Inform. Science 6, 59--66.
 
135
136
137
 
138
139
140
 
141
Rice, R. F. 1979. Some practical universal noiseless coding techniques. Tech. Rep. 79--22, Jet Propulsion Laboratory, Pasadena, CA.
 
142
Robertson, S. E. 1977. The probability ranking principle in IR. J. Document. 33, 4 (Dec.) 294--304.
 
143
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M. 1994. Okapi at TREC-3. In Overview of the 3rd TREC Text REtrieval Conference, Gaithersburg, MD, D. Harman, Ed. NIST, NIST Special Publication 500-226.
 
144
Rogers, W., Candela, G., and Harman, D. 1995. Space and time improvements for indexing in information retrieval. In Proceedings of the Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, L. Spitz and D. D. Lewis, Eds.
145
 
146
 
147
Salton, G. 1962. The use of citations as an aid to automatic content analysis. Tech. Rep. ISR-2, Section III, Harvard Computation Laboratory, Cambridge, MA.
 
148
 
149
150
 
151
152
 
153
 
154
155
156
 
157
158
 
159
Schuegraf, E. J. 1976. Compression of large inverted files with hyperbolic term distribution. Inform. Proc. Manag. 12, 377--384.
 
160
161
 
162
Shieh, W.-Y., Chen, T.-F., and Chung, C.-P. 2003. A tree-based inverted file for fast ranked-document retrieval. In Proceedings of the International Conference on Information and Knowledge Engineering. Las Vegas, NV. H. R. Arabnia, Ed. CSREA Press, 64--69.
 
163
 
164
 
165
Shieh, W.-Y., Shann, J. J.-J., and Chung, C.-P. 2003. An inverted file cache for fast information retrieval. J. Inform. Science Eng. 19, 4, 681--695.
 
166
167
168
169
 
170
 
171
 
172
 
173
Spink, A. and Xu, J. L. 2000. Selected results from a large study of web searching: The Excite study. Inform. Resear.---Int. Electron. J. 6, 1.
174
175
 
176
Stellhorn, W. H. 1977. An inverted file processor for information retrieval. IEEE Trans. Comput. 26, 12, 1258--1267.
177
178
 
179
Teuhola, J. 1978. A compression method for clustered bit-vectors. Inform. Proc. Lett. 7, 6 (Oct.) 308--311.
 
180
 
181
182
 
183
 
184
 
185
van Rijsbergen, C. J. 1979. Informat. Retriev., 2nd Ed. Butterworths, London, UK.
 
186
Vasanthakumar, S. R., Callan, J. P., and Croft, W. B. 1996. Integrating INQUERY with an RDBMS to support text retrieval. Bull. Techn. Comm. Data Eng. 19, 1, 24--33.
187
188
 
189
 
190
Williams, H. E. and Zobel, J. 1999. Compressing integers for fast file access. Comput. J. 42, 3, 193--201.
 
191
Williams, H. E., Zobel, J., and Anderson, P. 1999. What's next? Index structures for efficient phrase querying. In Proceedings of the Australasian Database Conference. Auckland, New Zealand. M. Orlowska, Ed. Australian Computer Society, 141--152.
192
 
193
Witten, I. H., Bell, T. C., and Nevill, C. G. 1991. Models for compression in full-text retrieval systems. In Proceedings of the IEEE Data Compression Conference. Snowbird, UT, J. A. Storer and J. H. Reif, Eds. IEEE Computer Society Press, Los Alamitos, CA. 23--32.
 
194
 
195
 
196
Xi, W., Sornil, O., and Fox, E. A. 2002a. Hybrid partition inverted files for large-scale digital libraries. In Proceedings of the Digital Library: IT Opportunities and Challenges in the New Millennium. Beijing Library Press, Beijing, China, 404--418.
 
197
198
199
200
201
 
202
 
203
 
204
Zobel, J., Moffat, A., and Sacks-Davis, R. 1993b. Storage management for files of dynamic records. In Proceedings of the Australasian Database Conference. Brisbane, Australia, 26--38.
 
205

CITED BY  34

Collaborative Colleagues:
Justin Zobel: colleagues
Alistair Moffat: colleagues