ACM Home Page
Please provide us with feedback. Feedback
Building a distributed full-text index for the web
Full text PdfPdf (652 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 19 ,  Issue 3  (July 2001) table of contents
Pages: 217 - 241  
Year of Publication: 2001
ISSN:1046-8188
Authors
Sergey Melink  Stanford University, Computer Science Dept. Stanford, CA
Sriram Raghavan  Stanford University, Computer Science Dept. Stanford, CA
Beverly Yang  Stanford University, Computer Science Dept. Stanford, CA
Hector Garcia-Molina  Stanford University, Computer Science Dept. Stanford, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 105,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502115.502116
What is a DOI?

ABSTRACT

We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time. We also propose a storage scheme for creating and managing inverted files using an embedded database system. We suggest and compare different strategies for collecting global statistics from distributed inverted indexes. Finally, we present performance results from experiments on a testbed distributed Web indexing system that we have implemented.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
 
5
 
6
CCITT. 1988. Recommendation X.209, Specification of Basic Encoding Rules for Abstract Syntax Notation one (ASN. 1).
7
 
8
 
9
CRASWELL, N., HAWKING,D.,AND THISTLEWALTE, P. 1999. Merging results from isolated search engines. In Proceedings of the 10th Australasian Database Conference (January).
 
10
11
 
12
 
13
GORSSMAN,D.A.AND DRISCOLL, J. R. 1992. Structuring text within a relation system. In Proceedigns of the 3rd International Conference on Database and Expert System Applications (September), 72-77.
 
14
GRAVANO, L., CHANG, K., GARCIA-MOLINA, H., LAGOZE,C.,AND PAEPCKE, A. 1997. STARTS-stanford protocol for internet retrieval and search. http://www-db.stanford.edu/ gravano/starts.html.
 
15
HAWKING,D.AND CRASWELL, N. 1998. Overview of TREC-7 very large collection track. In Proceedings of the Seventh Text Retrieval Conference (November), 91-104.
 
16
 
17
INKTOMI. 2000. Inktomi WebMap. http://www.inktomi.com/webmap/.
 
18
 
19
 
20
LAWRENCE,S.AND GILES, C. L. 1999. Accessibility of information on the web. Nature 400, 107-109.
 
21
22
 
23
MELNIK, S., GARCIA-MOLINA, H., YANG,B.,AND RAGHAVAN, S. 2000. Building a distributed full-text index for the web. Technical Report SIDL-WP-2000-0140 (July), Stanford Digital Library Project, Computer Science Dept., Stanford University. Available at www-diglib.stanford.edu/cgibin/get/SIDL-WP-2000-0140.
 
24
25
 
26
OLSON, M., BOSTIC, K., AND SELTZER, M. 1999. Berkeley DB. In Proceedings of the 1999 Summer Usenix Technical Conference (June).
27
28
 
29
SALTON, G. 1989. Information Retrieval: Data Structures and Algorithms. Addison-Wesley, Reading, Massachussetts.
 
30
 
31
32
 
33
VILES, C. L. 1994. Maintaining state in a distributed information retrieval system. In 32nd Southeast Conference of the ACM, ACM Press, New York, NY, 157-161.
34
 
35
 
36

CITED BY  12


REVIEW

"Dimitrios Katsaros : Reviewer"

The building of a distributed, full-text (inverted) index for very large collections of documents, such as those encountered in search engines for the Web, can create architectural challenges. This paper explains how a three-tier architecture can   more...

Collaborative Colleagues:
Sergey Melink: colleagues
Sriram Raghavan: colleagues
Beverly Yang: colleagues
Hector Garcia-Molina: colleagues