ACM Home Page
Please provide us with feedback. Feedback
Searching the Web
Full text PdfPdf (320 KB)
Source ACM Transactions on Internet Technology (TOIT) archive
Volume 1 ,  Issue 1  (August 2001) table of contents
Pages: 2 - 43  
Year of Publication: 2001
ISSN:1533-5399
Authors
Arvind Arasu  Stanford University, Computer Science Dept. Stanford, CA
Junghoo Cho  Stanford University, Computer Science Dept. Stanford, CA
Hector Garcia-Molina  Stanford University, Computer Science Dept. Stanford, CA
Andreas Paepcke  Stanford University, Computer Science Dept. Stanford, CA
Sriram Raghavan  Stanford University, Computer Science Dept. Stanford, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 64,   Downloads (12 Months): 495,   Citation Count: 71
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/383034.383035
What is a DOI?

ABSTRACT

We offer an overview of current Web search engine design. After introducing a generic search engine architecture, we examine each engine component in turn. We cover crawling, local Web page storage, indexing, and the use of link analysis for boosting search performance. The most common design and implementation techniques for each of these components are presented. For this presentation we draw from the literature and from our own experimental search engine testbed. Emphasis is on introducing the fundamental concepts and the results of several performance analyses we conducted to compare different designs.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
ALBERT, R., BARABASI, A.-L., AND JEONG, H. 1999. Diameter of the World Wide Web. Nature 401, 6749 (Sept.).
3
4
 
5
 
6
BARABASI, A.-L. AND ALBERT, R. 1999. Emergence of scaling in random networks. Science 286, 5439 (Oct.), 509-512.
 
7
 
8
 
9
 
10
 
11
 
12
CHAKRABARTI, S., DOM, B., GIBSON, D., KUMAR,S.R.,RAGHAVAN, P., RAJAGOPALAN, S., AND TOMKINS, A. 1998a. Spectral filtering for resource discovery. In Proceedings of the ACM SIGIR Workshop on Hypertext Information Retrieval on the Web (Melbourne, Australia). ACM Press, New York, NY.
13
 
14
15
 
16
 
17
CHO,J.AND GARCIA-MOLINA, H. 2000a. Estimating frequency of change. Submitted for publication.
 
18
19
 
20
 
21
COFFMAN,E.J.,LIU, Z., AND WEBER, R. R. 1997. Optimal robot scheduling for web search engines. Tech. Rep. INRIA, Rennes, France.
 
22
 
23
 
24
DOUGLIS, F., FELDMANN, A., AND KRISHNAMURTHY,, B. 1999. Rate of change and other metrics: a live study of the world wide web. In Proceedings of the USENIX Symposium on Internetworking Technologies and Systems. USENIX Assoc., Berkeley, CA.
25
 
26
EGGHE,L.AND ROUSSEAU, R. 1990. Introduction to Informetrics. Elsevier Science Inc., New York, NY.
27
28
 
29
GARFIELD, E. 1972. Citation analysis as a tool in journal evaluation. Science 178, 471-479.
30
 
31
 
32
HAVELIWALA, T. 1999. Efficient computation of pagerank. Tech. Rep. 1999-31. Computer Systems Laboratory, Stanford University, Stanford, CA. http://dbpubs.stanford.edu/ pub/1999-31.
 
33
HAWKING, D., CRASWELL, N., AND THISTLEWAITE, P. 1998. Overview of TREC-7 very large collection track. In Proceedings of the 7th Conference on Text Retrieval (TREC-7).
 
34
 
35
HUBERMAN,B.A.AND ADAMIC, L. A. 1999. Growth dynamics of the world wide web. Nature 401, 6749 (Sept.).
36
 
37
KOSTER, M. 1995. Robots in the web: trick or treat? ConneXions 9, 4 (Apr.).
 
38
 
39
LAWRENCE,S.AND GILES, C. 1998. Searching the world wide web. Science 280, 98-100.
 
40
LAWRENCE,S.AND GILES, C. 1999. Accessibility of information on the web. Nature 400, 107-109.
 
41
42
 
43
MELNIK, S., RAGHAVAN, S., YANG, B., AND GARCIA-MOLINA, H. 2000. Building a distributed full-text index for the web. Tech. Rep. SIDL-WP-2000-0140, Stanford Digital Library Project. Computer Systems Laboratory, Stanford University, Stanford, CA. http://www-diglib.stanford.edu/cgi-bin/get/SIDL-WP-2000-0140.
44
 
45
 
46
 
47
PAGE, L., BRIN, S., MOTWANI, R., AND WINOGRAD, T. 1998. The pagerank citation ranking: Bringing order to the web. Tech. Rep.. Computer Systems Laboratory, Stanford University, Stanford, CA.
 
48
PINSKI,G.AND NARIN, F. 1976. Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Inf. Process. Manage. 12.
49
50
 
51
ROBOTS EXCLUSION PROTOCOL. 2000. Robots Exclusion Protocol. http://info.webcrawler.com/ mak/projects/robots/exclusion.html.
 
52
 
53
54
 
55
 
56

CITED BY  71


REVIEW

"Jun Lin : Reviewer"

Are you curious about the way Web search engines provide users with a list of URLs after just a few keywords are entered? This article gives an overview on the core engine that makes this possible.

The authors start by discussing the challen  more...

Collaborative Colleagues:
Arvind Arasu: colleagues
Junghoo Cho: colleagues
Hector Garcia-Molina: colleagues
Andreas Paepcke: colleagues
Sriram Raghavan: colleagues