ACM Home Page
Please provide us with feedback. Feedback
Evaluating topic-driven web crawlers
Full text PdfPdf (210 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
New Orleans, Louisiana, United States
Pages: 241 - 249  
Year of Publication: 2001
ISBN:1-58113-331-6
Authors
Filippo Menczer  Univ. of Iowa, Iowa City
Gautam Pant  Univ. of Iowa, Iowa City
Padmini Srinivasan  Univ. of Iowa, Iowa City
Miguel E. Ruiz  Textwise, Syracuse, NY
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 34
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/383952.383995
What is a DOI?

ABSTRACT

Due to limited bandwidth, storage, and computational resources, and to the dynamic nature of the Web, search engines cannot index every Web page, and even the covered portion of the Web cannot be monitored continuously for changes. Therefore it is essential to develop effective crawling strategies to prioritize the pages to be indexed. The issue is even more important for topic-specific search engines, where crawlers must make additional decisions based on the relevance of visited pages. However, it is difficult to evaluate alternative crawling strategies because relevant sets are unknown and the search space is changing. We propose three different methods to evaluate crawling strategies. We apply the proposed metrics to compare three topic-driven crawling algorithms based on similarity ranking, link analysis, and adaptive agents.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
 
5
 
6
 
7
 
8
 
9
T.Haveliwala.E .cient computation of pagerank. Technical report,Stanford Database Group,1999.
 
10
 
11
 
12
 
13
14
 
15
16
 
17
B.Pinkerton.Finding what people want:Experiences with the webcrawler.In Proceedings of the First International World Wide Web Conference,Geneva, Switzerland 1994.
 
18
M.Porter.An algorithm for su .x stripping.Program 14(3):130 -137,1980. http://www.muscat.com/~martin/stem.html.
 
19
 
20
21
 
22
 
23
 
24
 
25

CITED BY  35

Collaborative Colleagues:
Filippo Menczer: colleagues
Gautam Pant: colleagues
Padmini Srinivasan: colleagues
Miguel E. Ruiz: colleagues