ACM Home Page
Please provide us with feedback. Feedback
Graph based crawler seed selection
Full text PdfPdf (590 KB)
Source
International World Wide Web Conference archive
Proceedings of the 18th international conference on World wide web table of contents
Madrid, Spain
POSTER SESSION: Wednesday, April 22, 2009 table of contents
Pages 1089-1090  
Year of Publication: 2009
ISBN:978-1-60558-487-4
Authors
Shuyi Zheng  Pennsylvania State University, University Park, PA, USA
Pavel Dmitriev  Yahoo! Labs, Santa Clara, CA, USA
C. Lee Giles  Pennsylvania State University, University Park, PA, USA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 81,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1526709.1526870
What is a DOI?

ABSTRACT

This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper seeds can increase the number of pages a crawler will discover, and can result in a collection with more ``good" and less "bad" pages. Based on the analysis of the graph structure of the web, we propose several seed selection algorithms. Effectiveness of these algorithms is proved by our experimental results on real web data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. Hochbaum and A. Pathria. Analysis of the Greedy Approach in Problems of Maximum k-Coverage. Naval Research Logistics, 45(6):615--627, 1998.
 
2
G. Pant, P. Srinivasan, and F. Menczer. Crawling the Web. Web Dynamics, pages 153--178, 2004.

Collaborative Colleagues:
Shuyi Zheng: colleagues
Pavel Dmitriev: colleagues
C. Lee Giles: colleagues