ACM Home Page
Please provide us with feedback. Feedback
Just in time indexing for up to the second search
Full text PdfPdf (898 KB)
Source
Conference on Information and Knowledge Management archive
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management table of contents
Lisbon, Portugal
SESSION: Enterprise information management (IND) table of contents
Pages 97-106  
Year of Publication: 2007
ISBN:978-1-59593-803-9
Authors
Ronny Lempel  IBM Research, Haifa, Israel
Yosi Mass  IBM Research, Haifa, Israel
Shila Ofek-Koifman  IBM Research, Haifa, Israel
Dafna Sheinwald  IBM Research, Haifa, Israel
Yael Petruschka  IBM Research, Haifa, Israel
Ron Sivan  IBM Research, Haifa, Israel
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 99,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1321440.1321457
What is a DOI?

ABSTRACT

E-commerce and intranet search systems require newly arriving content to be indexed and made available for search within minutes or hours of arrival. Applications such as file system and email search demand even faster turnaround from search systems, requiring new content to become available for search almost instantaneously. However, incrementally updating inverted indices, which are the predominant datastructure used in search engines, is an expensive operation that most systems avoid performing at high rates.

We present JiTI, a Just-in-Time Indexing component that allows searching over incoming content (nearly) as soon as that content reaches the system. JiTI's main idea is to invest less in the preprocessing of arriving data, at the expense of a tolerable latency in query response time. It is designed for deployment in search systems that maintain a large main index and that rebuild smaller stop-press indices once or twice an hour. JiTI augments such systems with instant retrieval capabilities over content arriving in between the stop-press builds. A main design point is for JiTI to demand few computational resources, in particular RAM and I/O.

Our experiments consisted of injecting several documents and queries per second concurrently into the system over half-hour long periods. We believe that there are search applications for which the combination of the workloads we experimented with and the response times we measured present a viable solution to a pressing problem.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
E. A. Brewer. Combining systems and databases: A search engine retrospective. In J. M. Hellerstein and M. Stonebraker, editors, Readings in Database Systems, Fourth edition. MIT Press, February 2005.
4
 
5
6
 
7
 
8
 
9
T. Chiueh and L. Huang. Efficient real-time index updates in text retrieval systems. Technical Report ECSL Technical Report 66, Stony Brook University, August 1998.
 
10
 
11
 
12
 
13
A. S. Foundation. Apache lucene search library. http://lucene.apache.org/.
 
14
R. G. Gallager. Discrete Stochastic Processes. Kluwer Academic Publishers, 1996.
 
15
 
16
17
18
 
19
 
20
21
 
22
E. P. Markatos. On caching search engine query results. In Proc. 5th International Web Caching and Content Delivery Workshop, May 2000.
23
 
24
 
25
 
26
C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical Report 1998-014, Compaq Systems Research Center, October 1998.
 
27
F. Silvestri. High Performance Issues in Web Search Engines: Algorithms and Techniques. PhD thesis, Dipartimento di Informatica, Università di Pisa, May 2004.
28
 
29
I. Witten, A. Moffat, and T. Bell. Managing Gigabytes. Morgan Kaufmann Publishers, Inc., San Francisco, CA, second edition, 1999.

Collaborative Colleagues:
Ronny Lempel: colleagues
Yosi Mass: colleagues
Shila Ofek-Koifman: colleagues
Dafna Sheinwald: colleagues
Yael Petruschka: colleagues
Ron Sivan: colleagues