| Estimating PageRank on graph streams |
| Full text |
Pdf
(371 KB)
|
Source
|
Symposium on Principles of Database Systems
archive
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
table of contents
Vancouver, Canada
SESSION: Streaming data & best paper award
table of contents
Pages 69-78
Year of Publication: 2008
ISBN:978-1-60558-152-1
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 154, Citation Count: 1
|
|
|
ABSTRACT
This study focuses on computations on large graphs (e.g., the web-graph) where the edges of the graph are presented as a stream. The objective in the streaming model is to use small amount of memory (preferably sub-linear in the number of nodes n) and a few passes. In the streaming model, we show how to perform several graph computations including estimating the probability distribution after a random walk of length l, mixing time, and the conductance. We estimate the mixing time M of a random walk in Õ(nα+Mα√n+√Mn/ α) space and Õ(√Mα) passes. Furthermore, the relation between mixing time and conductance gives us an estimate for the conductance of the graph. By applying our algorithm for computing probability distribution on the web-graph, we can estimate the PageRank p of any node up to an additive error of √εp in Õ(√M/α) passes and Õ(min(nα + 1/ε √M/α + 1/ε Mα, αn√Mα + 1/ε √M/α)) space, for any α ∈ (0, 1]. In particular, for ε = M/n, by setting α = M--1/2, we can compute the approximate PageRank values in Õ(nM--1/4) space and Õ(M3/4) passes. In comparison, a standard implementation of the PageRank algorithm will take O(n) space and O(M) passes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
Roy Armoni , Amnon Ta-Shma , Avi Wigderson , Shiyu Zhou, SL ⊆L4/3, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, p.230-239, May 04-06, 1997, El Paso, Texas, United States
[doi> 10.1145/258533.258593]
|
| |
4
|
Ziv Bar-Yossef , Ravi Kumar , D. Sivakumar, Reductions in streaming algorithms, with an application to counting triangles in graphs, Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.623-632, January 06-08, 2002, San Francisco, California
|
| |
5
|
T. Batu , L. Fortnow , E. Fischer , R. Kumar , R. Rubinfeld , P. White, Testing Random Variables for Independence and Identity, Proceedings of the 42nd IEEE symposium on Foundations of Computer Science, p.442, October 14-17, 2001
|
| |
6
|
|
 |
7
|
Lakshminath Bhuvanagiri , Sumit Ganguly , Deepanjan Kesh , Chandan Saha, Simpler algorithm for estimating frequency moments of data streams, Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, p.708-713, January 22-26, 2006, Miami, Florida
[doi> 10.1145/1109557.1109634]
|
| |
8
|
|
 |
9
|
Luciana S. Buriol , Gereon Frahling , Stefano Leonardi , Alberto Marchetti-Spaccamela , Christian Sohler, Counting triangles in data streams, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
[doi> 10.1145/1142351.1142388]
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
Joan Feigenbaum , Sampath Kannan , Andrew McGregor , Siddharth Suri , Jian Zhang, Graph distances in the streaming model: the value of space, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, January 23-25, 2005, Vancouver, British Columbia
|
| |
14
|
Jon Feldman, S. Muthukrishnan, Anastasios Sidiropoulos, Cliff Stein, and Zoya Svitkina. On the complexity of processing massive, unordered, distributed data. In CoRR abs/cs/0611108, 2006.
|
 |
15
|
|
 |
16
|
|
| |
17
|
Sudipto Guha and Andrew McGregor. Space-efficient sampling. In In AISTATS, pages 169--176, 2007.
|
 |
18
|
|
| |
19
|
Sudipto Guha and Andrew McGrgor. Lower bounds for quantile estimation in random-order and multi-pass streaming. In ICALP, pages 704--715, 2007.
|
| |
20
|
|
| |
21
|
P. Indyk and D. P. Woodruff. Optimal approximations of the frequency moments of data streams. In IEEE Symposium on Foundations of Computer Science, FOCS, pages 283--292, 2003.
|
 |
22
|
|
| |
23
|
|
| |
24
|
H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In In COCOON, pages 710--716, 2005.
|
 |
25
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Random sampling techniques for space efficient online computation of order statistics of large datasets, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.251-262, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
26
|
A. McGregor. Finding graph matchings in data streams. In In APPROX-RANDOM, pages 170--181, 2005.
|
 |
27
|
|
 |
28
|
Tamás Sarlós , Adrás A. Benczúr , Károly Csalogány , Dániel Fogaras , Balázs Rácz, To randomize or not to randomize: space optimal summaries for hyperlink analysis, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
[doi> 10.1145/1135777.1135823]
|
| |
29
|
John Wicks and Amy R. Greenwald. Parallelizing the computation of pagerank. In Proc. 5th Workshop On Algorithms And Models For The Web-Graph (WAW), pages 202--208, 2007.
|
| |
30
|
|
|