ACM Home Page
Please provide us with feedback. Feedback
AggregateRank: bringing order to web sites
Full text PdfPdf (460 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Seattle, Washington, USA
SESSION: Web 1--exploiting graph structure table of contents
Pages: 75 - 82  
Year of Publication: 2006
ISBN:1-59593-369-7
Authors
Guang Feng  Microsoft Research Asia, Beijing, P. R. China and Tsinghua University, Beijing, P.R. china
Tie-Yan Liu  Microsoft Research Asia, Beijing, P. R. China
Ying Wang  Chinese Academy of Science, Beijing, China
Ying Bao  Microsoft Research Asia, Beijing, P. R. China and Chinese Academy of Science, Beijing, China
Zhiming Ma  Chinese Academy of Science, Beijing, China
Xu-Dong Zhang  Tsinghua University, Beijing, P.R. China
Wei-Ying Ma  Microsoft Research Asia, Beijing, P. R. China
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 156,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1148170.1148187
What is a DOI?

ABSTRACT

Since the website is one of the most important organizational structures of the Web, how to effectively rank websites has been essential to many Web applications, such as Web search and crawling. In order to get the ranks of websites, researchers used to describe the inter-connectivity among websites with a so-called HostGraph in which the nodes denote websites and the edges denote linkages between websites (if and only if there are hyperlinks from the pages in one website to the pages in the other, there will be an edge between these two websites), and then adopted the random walk model in the HostGraph. However, as pointed in this paper, the random walk over such a HostGraph is not reasonable because it is not in accordance with the browsing behavior of web surfers. Therefore, the derivate rank cannot represent the true probability of visiting the corresponding website.In this work, we mathematically proved that the probability of visiting a website by the random web surfer should be equal to the sum of the PageRank values of the pages inside that website. Nevertheless, since the number of web pages is much larger than that of websites, it is not feasible to base the calculation of the ranks of websites on the calculation of PageRank. To tackle this problem, we proposed a novel method named AggregateRank rooted in the theory of stochastic complement, which cannot only approximate the sum of PageRank accurately, but also have a lower computational complexity than PageRank. Both theoretical analysis and experimental evaluation show that AggregateRank is a better method for ranking websites than previous methods.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
Yuangui Lei, Enrico Motta, Domingue. Modelling Data-Intensive Web Sites with OntoWeaver. In Proceedings of International Workshop on Web Information Systems Modeling, Riga, Latvia, June 2004.
 
5
Tao Qin, Tie-Yan Liu, Xu-Dong Zhang, Guang Feng, Wei-Ying Ma, Subsite Retrieval: A Novel Concept for Topic Distillation, In Proceedings of 2nd Asia Information Retrieval Symposium, Jeju Island, Korea, October 2005
 
6
M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA, pages 7821--7826, 2002.
 
7
Henzinger, M.R., Motwani, R., Silverstein, C. Challenges in Web Search Engines. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (2003) 1573--1579
 
8
L. Page, S. Brin, R. Motwani, and T. Winograd. The Pagerank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Libraries, 1998.
9
 
10
Jie Wu, Karl Aberer. Using SiteRank for P2P Web Retrieval. EPFL Technical Report ID: IC/2004/31, 2004
 
11
 
12
 
13
 
14
 
15
 
16
Amy N. Langville and Carl D. Meyer. Deeper inside PageRank. Technical report, NCSU Center for Res. Sci Comp., 2003.
 
17
John G Kemeny and J.Laurie Snell, Finite Markov Chains Springer-Verlag, New York Berlin Herdelberg Tokyo, 1976.
 
18
William J. Stewart. Introduction to the Numerical Solution of Markov Chains. Princeton : Princeton University Press, 1994
 
19
S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford Univ., 2003.
 
20
F. R. Gantmacher, Matrix Theory (Chelsea, 1959) Vol. II, Chapter XIII.
 
21
G. E. Cho and C. D. Meyer. Aggregation/Disaggregation Methods of Nearly Uncoupled Markov Chains. http://meyer.math.ncsu.edu/Meyer/PS_Files/Numcad.ps
 
22
R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Rev. Mod. Phys. Vol. 74, January 2002.
 
23
M. Kendall and J. Gibbons. Rank Correlation Methods. Edward Arnold, London, 5 edition, 1990.


Collaborative Colleagues:
Guang Feng: colleagues
Tie-Yan Liu: colleagues
Ying Wang: colleagues
Ying Bao: colleagues
Zhiming Ma: colleagues
Xu-Dong Zhang: colleagues
Wei-Ying Ma: colleagues