| AggregateRank: bringing order to web sites |
| Full text |
Pdf
(460 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Seattle, Washington, USA
SESSION: Web 1--exploiting graph structure
table of contents
Pages: 75 - 82
Year of Publication: 2006
ISBN:1-59593-369-7
|
|
Authors
|
|
Guang Feng
|
Microsoft Research Asia, Beijing, P. R. China and Tsinghua University, Beijing, P.R. china
|
|
Tie-Yan Liu
|
Microsoft Research Asia, Beijing, P. R. China
|
|
Ying Wang
|
Chinese Academy of Science, Beijing, China
|
|
Ying Bao
|
Microsoft Research Asia, Beijing, P. R. China and Chinese Academy of Science, Beijing, China
|
|
Zhiming Ma
|
Chinese Academy of Science, Beijing, China
|
|
Xu-Dong Zhang
|
Tsinghua University, Beijing, P.R. China
|
|
Wei-Ying Ma
|
Microsoft Research Asia, Beijing, P. R. China
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 16, Downloads (12 Months): 156, Citation Count: 4
|
|
|
ABSTRACT
Since the website is one of the most important organizational structures of the Web, how to effectively rank websites has been essential to many Web applications, such as Web search and crawling. In order to get the ranks of websites, researchers used to describe the inter-connectivity among websites with a so-called HostGraph in which the nodes denote websites and the edges denote linkages between websites (if and only if there are hyperlinks from the pages in one website to the pages in the other, there will be an edge between these two websites), and then adopted the random walk model in the HostGraph. However, as pointed in this paper, the random walk over such a HostGraph is not reasonable because it is not in accordance with the browsing behavior of web surfers. Therefore, the derivate rank cannot represent the true probability of visiting the corresponding website.In this work, we mathematically proved that the probability of visiting a website by the random web surfer should be equal to the sum of the PageRank values of the pages inside that website. Nevertheless, since the number of web pages is much larger than that of websites, it is not feasible to base the calculation of the ranks of websites on the calculation of PageRank. To tackle this problem, we proposed a novel method named AggregateRank rooted in the theory of stochastic complement, which cannot only approximate the sum of PageRank accurately, but also have a lower computational complexity than PageRank. Both theoretical analysis and experimental evaluation show that AggregateRank is a better method for ranking websites than previous methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
Yuangui Lei, Enrico Motta, Domingue. Modelling Data-Intensive Web Sites with OntoWeaver. In Proceedings of International Workshop on Web Information Systems Modeling, Riga, Latvia, June 2004.
|
| |
5
|
Tao Qin, Tie-Yan Liu, Xu-Dong Zhang, Guang Feng, Wei-Ying Ma, Subsite Retrieval: A Novel Concept for Topic Distillation, In Proceedings of 2nd Asia Information Retrieval Symposium, Jeju Island, Korea, October 2005
|
| |
6
|
M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA, pages 7821--7826, 2002.
|
| |
7
|
Henzinger, M.R., Motwani, R., Silverstein, C. Challenges in Web Search Engines. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (2003) 1573--1579
|
| |
8
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The Pagerank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Libraries, 1998.
|
 |
9
|
|
| |
10
|
Jie Wu, Karl Aberer. Using SiteRank for P2P Web Retrieval. EPFL Technical Report ID: IC/2004/31, 2004
|
| |
11
|
|
| |
12
|
Stephen Dill , Ravi Kumar , Kevin S. McCurley , Sridhar Rajagopalan , D. Sivakumar , Andrew Tomkins, Self-similarity in the Web, Proceedings of the 27th International Conference on Very Large Data Bases, p.69-78, September 11-14, 2001
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Amy N. Langville and Carl D. Meyer. Deeper inside PageRank. Technical report, NCSU Center for Res. Sci Comp., 2003.
|
| |
17
|
John G Kemeny and J.Laurie Snell, Finite Markov Chains Springer-Verlag, New York Berlin Herdelberg Tokyo, 1976.
|
| |
18
|
William J. Stewart. Introduction to the Numerical Solution of Markov Chains. Princeton : Princeton University Press, 1994
|
| |
19
|
S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford Univ., 2003.
|
| |
20
|
F. R. Gantmacher, Matrix Theory (Chelsea, 1959) Vol. II, Chapter XIII.
|
| |
21
|
G. E. Cho and C. D. Meyer. Aggregation/Disaggregation Methods of Nearly Uncoupled Markov Chains. http://meyer.math.ncsu.edu/Meyer/PS_Files/Numcad.ps
|
| |
22
|
R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Rev. Mod. Phys. Vol. 74, January 2002.
|
| |
23
|
M. Kendall and J. Gibbons. Rank Correlation Methods. Edward Arnold, London, 5 edition, 1990.
|
CITED BY 4
|
|
|
|
|
|
|
|
Ramakrishna Varadarajan , Vagelis Hristidis , Louiqa Raschid , Maria-Esther Vidal , Luis Ibáñez , Héctor Rodríguez-Drumond, Flexible and efficient querying and ranking on hyperlinked data sources, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
Ying Liu , Kun Bai , Prasenjit Mitra , C. Lee Giles, Tablerank: a ranking algorithm for table search and retrieval, Proceedings of the 22nd national conference on Artificial intelligence, p.317-322, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|