ACM Home Page
Please provide us with feedback. Feedback
Scaling link-based similarity search
Full text PdfPdf (241 KB)
Source International World Wide Web Conference archive
Proceedings of the 14th international conference on World Wide Web table of contents
Chiba, Japan
SESSION: Link-based similarity table of contents
Pages: 641 - 650  
Year of Publication: 2005
ISBN:1-59593-046-9
Authors
Dániel Fogaras  Budapest University of Technology and Economics, Budapest, Hungary
Balázs Rácz  Computer and Automation Research Institute of the Hungarian Academy of Sciences, Budapest, Hungary
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 101,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1060745.1060839
What is a DOI?

ABSTRACT

To exploit the similarity information hidden in the hyperlink structure of the web, this paper introduces algorithms scalable to graphs with billions of vertices on a distributed architecture. The similarity of multi-step neighborhoods of vertices are numerically evaluated by similarity functions including SimRank [20], a recursive refinement of cocitation; PSimRank, a novel variant with better theoretical characteristics; and the Jaccard coefficient, extended to multi-step neighborhoods. Our methods are presented in a general framework of Monte Carlo similarity search algorithms that precompute an index database of random fingerprints, and at query time, similarities are estimated from the fingerprints. The performance and quality of the methods were tested on the Stanford Webbase [19] graph of 80M pages by comparing our scores to similarities extracted from the ODP directory [26]. Our experimental results suggest that the hyperlink structure of vertices within four to five steps provide more adequate information for similarity search than single-step neighborhoods.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
 
5
 
6
 
7
8
9
 
10
 
11
 
12
D. Fogaras and B. Rácz. A scalable randomized method to compute link-based similarity rank on the web graph. Proc. of Clustering Information over the Web (ClustWeb), in conjunction with EDBT, 2004.
 
13
D. Fogaras and B. Rácz. Scaling link-based similarity search. Technical report, MTA SZTAKI, 2004. http://www.ilab.sztaki.hu/websearch/Publications/.
 
14
D. Fogaras and B. Rácz. Towards scaling fully personalized PageRank. Proc. of Third Workshop on Algorithms and Models for the Web-Graph (WAW) in conjunction with FOCS, 2004.
 
15
T. H. Haveliwala. Efficient computation of PageRank. Technical Report 1999-31, Stanford University, 1999.
16
 
17
T. H. Haveliwala, S. Kamvar, and G. Jeh. An analytical comparison of approaches to personalizing PageRank. Technical report, Stanford University, 2003.
 
18
 
19
20
21
22
 
23
 
24
 
25
 
26
Open Directory Project (ODP). http://www.dmoz.org.
 
27
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
28
 
29
P. Rusmevichientong, D. M. Pennock, S. Lawrence, and C. L. Giles. Methods for sampling pages uniformly from the world wide web. Proc. of AAAI Fall Symposium on Using Uncertainty Within Computation, pages 121--128, 2001.

CITED BY  9

Collaborative Colleagues:
Dániel Fogaras: colleagues
Balázs Rácz: colleagues