| Just in time indexing for up to the second search |
| Full text |
Pdf
(898 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
table of contents
Lisbon, Portugal
SESSION: Enterprise information management (IND)
table of contents
Pages 97-106
Year of Publication: 2007
ISBN:978-1-59593-803-9
|
|
Authors
|
|
Ronny Lempel
|
IBM Research, Haifa, Israel
|
|
Yosi Mass
|
IBM Research, Haifa, Israel
|
|
Shila Ofek-Koifman
|
IBM Research, Haifa, Israel
|
|
Dafna Sheinwald
|
IBM Research, Haifa, Israel
|
|
Yael Petruschka
|
IBM Research, Haifa, Israel
|
|
Ron Sivan
|
IBM Research, Haifa, Israel
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 99, Citation Count: 0
|
|
|
ABSTRACT
E-commerce and intranet search systems require newly arriving content to be indexed and made available for search within minutes or hours of arrival. Applications such as file system and email search demand even faster turnaround from search systems, requiring new content to become available for search almost instantaneously. However, incrementally updating inverted indices, which are the predominant datastructure used in search engines, is an expensive operation that most systems avoid performing at high rates. We present JiTI, a Just-in-Time Indexing component that allows searching over incoming content (nearly) as soon as that content reaches the system. JiTI's main idea is to invest less in the preprocessing of arriving data, at the expense of a tolerable latency in query response time. It is designed for deployment in search systems that maintain a large main index and that rebuild smaller stop-press indices once or twice an hour. JiTI augments such systems with instant retrieval capabilities over content arriving in between the stop-press builds. A main design point is for JiTI to demand few computational resources, in particular RAM and I/O. Our experiments consisted of injecting several documents and queries per second concurrently into the system over half-hour long periods. We believe that there are search applications for which the combination of the workloads we experimented with and the response times we measured present a viable solution to a pressing problem.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
E. A. Brewer. Combining systems and databases: A search engine retrospective. In J. M. Hellerstein and M. Stonebraker, editors, Readings in Database Systems, Fourth edition. MIT Press, February 2005.
|
 |
4
|
Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman , Shalom Tsur, Dynamic itemset counting and implication rules for market basket data, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.255-264, May 11-15, 1997, Tucson, Arizona, United States
|
| |
5
|
|
 |
6
|
Andrei Z. Broder , David Carmel , Michael Herscovici , Aya Soffer , Jason Zien, Efficient query evaluation using a two-level retrieval process, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
[doi> 10.1145/956863.956944]
|
| |
7
|
|
| |
8
|
|
| |
9
|
T. Chiueh and L. Huang. Efficient real-time index updates in text retrieval systems. Technical Report ECSL Technical Report 66, Stony Brook University, August 1998.
|
| |
10
|
|
| |
11
|
|
| |
12
|
Marcus Fontoura , Engene Shekita , Jason Y. Zien , Sridhar Rajagopalan , Andreas Neumann, High performance index build algorithms for intranet search engines, Proceedings of the Thirtieth international conference on Very large data bases, p.1122-1133, August 31-September 03, 2004, Toronto, Canada
|
| |
13
|
A. S. Foundation. Apache lucene search library. http://lucene.apache.org/.
|
| |
14
|
R. G. Gallager. Discrete Stochastic Processes. Kluwer Academic Publishers, 1996.
|
| |
15
|
|
| |
16
|
|
 |
17
|
Björn T. Jónsson , Michael J. Franklin , Divesh Srivastava, Interaction of query evaluation and buffer management for information retrieval, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.118-129, June 01-04, 1998, Seattle, Washington, United States
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
Lipyeow Lim , Min Wang , Sriram Padmanabhan , Jeffrey Scott Vitter , Ramesh Agarwal, Dynamic maintenance of web indexes using landmarks, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary
[doi> 10.1145/775152.775167]
|
| |
22
|
E. P. Markatos. On caching search engine query results. In Proc. 5th International Web Caching and Content Delivery Workshop, May 2000.
|
 |
23
|
Sergey Melnik , Sriram Raghavan , Beverly Yang , Hector Garcia-Molina, Building a distributed full-text index for the Web, Proceedings of the 10th international conference on World Wide Web, p.396-406, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372095]
|
| |
24
|
|
| |
25
|
|
| |
26
|
C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical Report 1998-014, Compaq Systems Research Center, October 1998.
|
| |
27
|
F. Silvestri. High Performance Issues in Web Search Engines: Algorithms and Techniques. PhD thesis, Dipartimento di Informatica, Università di Pisa, May 2004.
|
 |
28
|
Anthony Tomasic , Héctor García-Molina , Kurt Shoens, Incremental updates of inverted lists for text document retrieval, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.289-300, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
29
|
I. Witten, A. Moffat, and T. Bell. Managing Gigabytes. Morgan Kaufmann Publishers, Inc., San Francisco, CA, second edition, 1999.
|
|