ACM Home Page
Please provide us with feedback. Feedback
Speeding up algorithms on compressed web graphs
Full text PdfPdf (537 KB)
Source Web Search and Web Data Mining archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining table of contents
Barcelona, Spain
SESSION: Graph mining and web content table of contents
Pages 272-281  
Year of Publication: 2009
ISBN:978-1-60558-390-7
Authors
Chinmay Karande  Georgia Inst. Technology, Atlanta, GA
Kumar Chellapilla  Microsoft Live Labs, Bellevue, WA
Reid Andersen  Microsoft Live Labs, Bellevue, WA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
: Google
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
: Yahoo! Research
Microsoft : Microsoft
: Nokia
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 149,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1498759.1498836
What is a DOI?

ABSTRACT

A variety of lossless compression schemes have been proposed to reduce the storage requirements of web graphs. One successful approach is virtual node compression [7], in which often-used patterns of links are replaced by links to virtual nodes, creating a compressed graph that succinctly represents the original. In this paper, we show that several important classes of web graph algorithms can be extended to run directly on virtual node compressed graphs, such that their running times depend on the size of the compressed graph rather than the original. These include algorithms for link analysis, estimating the size of vertex neighborhoods, and a variety of algorithms based on matrix-vector products and random walks. Similar speed-ups have been obtained previously for classical graph algorithms like shortest paths and maximum bipartite matching. We measure the performance of our modified algorithms on several publicly available web graph datasets, and demonstrate significant empirical speedups that nearly match the compression ratios.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Using rank propagation and probabilistic counting for link-based spam detection. In Proceedings of the Workshop on Web Mining and Web Usage Analysis (WebKDD), Pennsylvania, USA, August 2006. ACM Press.
 
3
4
 
5
 
6
7
 
8
F. R. K. Chung. Spectral Graph Theory.
 
9
 
10
 
11
12
 
13
R. Kannan and V. Vinay. Analyzing the structure of large graphs. Manuscript, 1999.
 
14
15
 
16
 
17
18
 
19
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
20

Collaborative Colleagues:
Chinmay Karande: colleagues
Kumar Chellapilla: colleagues
Reid Andersen: colleagues