|
ABSTRACT
We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time. We also propose a storage scheme for creating and managing inverted files using an embedded database system. We suggest and compare different strategies for collecting global statistics from distributed inverted indexes. Finally, we present performance results from experiments on a testbed distributed Web indexing system that we have implemented.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
CCITT. 1988. Recommendation X.209, Specification of Basic Encoding Rules for Abstract Syntax Notation one (ASN. 1).
|
 |
7
|
|
| |
8
|
|
| |
9
|
CRASWELL, N., HAWKING,D.,AND THISTLEWALTE, P. 1999. Merging results from isolated search engines. In Proceedings of the 10th Australasian Database Conference (January).
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
GORSSMAN,D.A.AND DRISCOLL, J. R. 1992. Structuring text within a relation system. In Proceedigns of the 3rd International Conference on Database and Expert System Applications (September), 72-77.
|
| |
14
|
GRAVANO, L., CHANG, K., GARCIA-MOLINA, H., LAGOZE,C.,AND PAEPCKE, A. 1997. STARTS-stanford protocol for internet retrieval and search. http://www-db.stanford.edu/ gravano/starts.html.
|
| |
15
|
HAWKING,D.AND CRASWELL, N. 1998. Overview of TREC-7 very large collection track. In Proceedings of the Seventh Text Retrieval Conference (November), 91-104.
|
| |
16
|
|
| |
17
|
INKTOMI. 2000. Inktomi WebMap. http://www.inktomi.com/webmap/.
|
| |
18
|
|
| |
19
|
|
| |
20
|
LAWRENCE,S.AND GILES, C. L. 1999. Accessibility of information on the web. Nature 400, 107-109.
|
| |
21
|
|
 |
22
|
|
| |
23
|
MELNIK, S., GARCIA-MOLINA, H., YANG,B.,AND RAGHAVAN, S. 2000. Building a distributed full-text index for the web. Technical Report SIDL-WP-2000-0140 (July), Stanford Digital Library Project, Computer Science Dept., Stanford University. Available at www-diglib.stanford.edu/cgibin/get/SIDL-WP-2000-0140.
|
| |
24
|
|
 |
25
|
|
| |
26
|
OLSON, M., BOSTIC, K., AND SELTZER, M. 1999. Berkeley DB. In Proceedings of the 1999 Summer Usenix Technical Conference (June).
|
 |
27
|
|
 |
28
|
Berthier Ribeiro-Neto , Edleno S. Moura , Marden S. Neubert , Nivio Ziviani, Efficient distributed algorithms to build inverted files, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.105-112, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312663]
|
| |
29
|
SALTON, G. 1989. Information Retrieval: Data Structures and Algorithms. Addison-Wesley, Reading, Massachussetts.
|
| |
30
|
|
| |
31
|
|
 |
32
|
Anthony Tomasic , Héctor García-Molina , Kurt Shoens, Incremental updates of inverted lists for text document retrieval, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.289-300, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
33
|
VILES, C. L. 1994. Maintaining state in a distributed information retrieval system. In 32nd Southeast Conference of the ACM, ACM Press, New York, NY, 157-161.
|
 |
34
|
|
| |
35
|
|
| |
36
|
|
CITED BY 12
|
|
Matthias Bender , Sebastian Michel , Peter Triantafillou , Gerhard Weikum , Christian Zimmer, Improving collection selection with overlap awareness in P2P search engines, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
B. Barla Cambazoglu , Evren Karaca , Tayfun Kucukyilmaz , Ata Turk , Cevdet Aykanat, Architecture of a grid-enabled Web search engine, Information Processing and Management: an International Journal, v.43 n.3, p.609-623, May, 2007
|
|
|
|
|
|
|
|
|
|
|
|
Edleno Silva de Moura , Celia Francisca dos Santos , Bruno Dos santos de Araujo , Altigran Soares da Silva , Pavel Calado , Mario A. Nascimento, Locality-Based pruning methods for web search, ACM Transactions on Information Systems (TOIS), v.26 n.2, p.1-28, March 2008
|
|
|
|
|
|
|
|
|
|
REVIEW
"Dimitrios Katsaros : Reviewer"
The building of a distributed, full-text (inverted) index for very large collections of documents, such as those encountered in search engines for the Web, can create architectural challenges. This paper explains how a three-tier architecture can
more...
|