ACM Home Page
Please provide us with feedback. Feedback
Estimating the global pagerank of web communities
Full text PdfPdf (1.52 MB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Philadelphia, PA, USA
SESSION: Research track papers table of contents
Pages: 116 - 125  
Year of Publication: 2006
ISBN:1-59593-339-5
Authors
Jason V. Davis  University of Texas at Austin, Austin, TX
Inderjit S. Dhillon  University of Texas at Austin, Austin, TX
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 93,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1150402.1150419
What is a DOI?

ABSTRACT

Localized search engines are small-scale systems that index a particular community on the web. They offer several benefits over their large-scale counterparts in that they are relatively inexpensive to build, and can provide more precise and complete search capability over their relevant domains. One disadvantage such systems have over large-scale search engines is the lack of global PageRank values. Such information is needed to assess the value of pages in the localized search domain within the context of the web as a whole. In this paper, we present well-motivated algorithms to estimate the global PageRank values of a local domain. The algorithms are all highly scalable in that, given a local domain of size n, they use O(n) resources that include computation time, bandwidth, and storage. We test our methods across a variety of localized domains, including site-specific domains and topic-specific domains. We demonstrate that by crawling as few as n or 2n additional pages, our methods can give excellent global PageRank estimates.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
P. Boldi, M. Santini, and S. Vigna. Do your worst to make the best: paradoxical effects in pagerank incremental computations. Workshop on Web Graphs, 3243:168--180, 2004.
 
3
 
4
5
 
6
 
7
T. H. Haveliwala and S. D. Kamvar. The second eigenvalue of the Google matrix. Technical report, Stanford University, 2003.
 
8
T. Joachims, F. Radlinski, L. Granka, A. Cheng, C. Tillekeratne, and A. Patel. Learning retrieval functions from implicit feedback. http://www.cs.cornell.edu/People/tj/career.
 
9
S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Exploiting the block structure of the web for computing pagerank. World-Wide Web Conference, 2003.
10
 
11
A. N. Langville and C. D. Meyer. Deeper inside pagerank. Internet Mathematics, 2004.
 
12
 
13
P. Lyman, H. R. Varian, K. Swearingen, P. Charles, N. Good, L. L. Jordan, and J. Pal. How much information 2003? School of Information Management and System, University of California at Berkely, 2003.
14
 
15
 
16
US News and World Report. http://www.usnews.com.
 
17
Dmoz open directory project. http://www.dmoz.org.
 
18
Nutch open source search engine. http://www.nutch.org.
19
 
20
21
 
22
Y. Wang and D. J. DeWitt. Computing pagerank in a distributed internet search system. Proceedings of the 30th VLDB Conference, 2004.


Collaborative Colleagues:
Jason V. Davis: colleagues
Inderjit S. Dhillon: colleagues