| Incremental updates of inverted lists for text document retrieval |
| Full text |
Pdf
(1.39 MB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 1994 ACM SIGMOD international conference on Management of data
table of contents
Minneapolis, Minnesota, United States
Pages: 289 - 300
Year of Publication: 1994
ISBN:0-89791-639-5
Also published in ...
|
|
Authors
|
|
Anthony Tomasic
|
Stanford University, Department of Computer Science, Stanford, CA
|
|
Héctor García-Molina
|
Stanford University, Department of Computer Science, Stanford, CA
|
|
Kurt Shoens
|
IBM Almaden Research Center, 650 Harry Road San Jose, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 15, Downloads (12 Months): 71, Citation Count: 45
|
|
|
ABSTRACT
With the proliferation of the world's “information highways” a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index. The index dynamically separates long and short inverted lists and optimizes retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering trade-offs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Samuel DeFazio. Full-text document retrieval benchmark. In Jim Gray, editor, The Benchmark Handbook }or Database and Transaction Processsng Systems, cha.pter 8. Morgan Kaufmann, second edition, 1993.
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
Donna Harman and Gerald Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society }or Information Science, 41(8):581-589, 1990.
|
| |
7
|
Donald E. Knuth. The Art of Computer Programmzng. Addison-Wesley, Reading, Massachusetts. 1973.
|
| |
8
|
|
| |
9
|
Kurt A. Shoens , Allen Luniewski , Peter M. Schwarz , James W. Stamos , Joachim Thomas, II, The Rufus System: Information Organization for Semi-Structured Data, Proceedings of the 19th International Conference on Very Large Data Bases, p.97-107, August 24-27, 1993
|
| |
10
|
|
| |
11
|
|
| |
12
|
George Kingsley Zipf. Human Behavior and the Prznciple of Least Effort. Addison-Wesley Press, 1949.
|
| |
13
|
|
CITED BY 45
|
|
|
|
|
|
|
|
|
|
|
Sergey Melnik , Sriram Raghavan , Beverly Yang , Hector Garcia-Molina, Building a distributed full-text index for the Web, Proceedings of the 10th international conference on World Wide Web, p.396-406, May 01-05, 2001, Hong Kong, Hong Kong
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lipyeow Lim , Min Wang , Sriram Padmanabhan , Jeffrey Scott Vitter , Ramesh Agarwal, Dynamic maintenance of web indexes using landmarks, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary
|
|
|
Charles L. A. Clarke , Philip L. Tilker , Allen Quoc-Luan Tran , Kevin Harris , Antonio S. Cheng, A reliable storage management layer for distributed information retrieval systems, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
|
|
|
|
|
|
K. L. Liu , G. J. Lipovski , C. Yu , Naphtali Rishe, Efficient processing of one and two dimensional proximity queries in associative memory, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.138-146, August 18-22, 1996, Zurich, Switzerland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Junghoo Cho , Hector Garcia-Molina , Taher Haveliwala , Wang Lam , Andreas Paepcke , Sriram Raghavan , Gary Wesley, Stanford WebBase components and applications, ACM Transactions on Internet Technology (TOIT), v.6 n.2, p.153-186, May 2006
|
|
|
|
|
|
Steve Lawrence , Kurt Bollacker , C. Lee Giles, Indexing and retrieval of scientific literature, Proceedings of the eighth international conference on Information and knowledge management, p.139-146, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
|
|
|
|
|
|
|
|
|
B. Barla Cambazoglu , Evren Karaca , Tayfun Kucukyilmaz , Ata Turk , Cevdet Aykanat, Architecture of a grid-enabled Web search engine, Information Processing and Management: an International Journal, v.43 n.3, p.609-623, May, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ronny Lempel , Yosi Mass , Shila Ofek-Koifman , Dafna Sheinwald , Yael Petruschka , Ron Sivan, Just in time indexing for up to the second search, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, November 06-10, 2007, Lisbon, Portugal
|
|
|
|
|
|
|
|
|
|
|
|
Vuk Ercegovac , Vanja Josifovski , Ning Li , Mauricio R. Mediano , Eugene J. Shekita, Supporting sub-document updates and queries in an inverted index, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|