ACM Home Page
Please provide us with feedback. Feedback
Parallel crawling for online social networks
Full text PdfPdf (347 KB)
Source
International World Wide Web Conference archive
Proceedings of the 16th international conference on World Wide Web table of contents
Banff, Alberta, Canada
POSTER SESSION: Social networks table of contents
Pages: 1283 - 1284  
Year of Publication: 2007
ISBN:978-1-59593-654-7
Authors
Duen Horng Chau  Carnegie Mellon University, Pittsburgh, PA
Shashank Pandit  Carnegie Mellon University, Pittsburgh, PA
Samuel Wang  Carnegie Mellon University, Pittsburgh, PA
Christos Faloutsos  Carnegie Mellon University, Pittsburgh, PA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 141,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1242572.1242809
What is a DOI?

ABSTRACT

Given a huge online social network, how do we retrieve information from it through crawling? Even better, how do we improve the crawling performance by using parallel crawlers that work independently? In this paper, we present the framework of parallel crawlers for online social networks, utilizing a centralized queue. To show how this works in practice, we describe our implementation of the crawlers for an online auction website. The crawlers work independently, therefore the failing of one crawler does not affect the others at all. The framework ensures that no redundant crawling would occur. Using the crawlers that we built, we visited a total of approximately 11 million auction users, about 66,000 of which were completely crawled.




Collaborative Colleagues:
Duen Horng Chau: colleagues
Shashank Pandit: colleagues
Samuel Wang: colleagues
Christos Faloutsos: colleagues