ACM Home Page
Please provide us with feedback. Feedback
Sampling from large graphs
Full text PdfPdf (734 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Philadelphia, PA, USA
POSTER SESSION: Research track posters table of contents
Pages: 631 - 636  
Year of Publication: 2006
ISBN:1-59593-339-5
Authors
Jure Leskovec  Carnegie Mellon University, Pittsburgh, PA
Christos Faloutsos  Carnegie Mellon University, Pittsburgh, PA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 175,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1150402.1150479
What is a DOI?

ABSTRACT

Given a huge real graph, how can we derive a representative sample? There are many known algorithms to compute interesting measures (shortest paths, centrality, betweenness, etc.), but several of them become impractical for large graphs. Thus graph sampling is essential.The natural questions to ask are (a) which sampling method to use, (b) how small can the sample size be, and (c) how to scale up the measurements of the sample (e.g., the diameter), to get estimates for the large graph. The deeper, underlying question is subtle: how do we measure success?.We answer the above questions, and test our answers by thorough experiments on several, diverse datasets, spanning thousands nodes and edges. We consider several sampling methods, propose novel methods to check the goodness of sampling, and develop a set of scaling laws that describe relations between the properties of the original and the sample.In addition to the theoretical contributions, the practical conclusions from our work are: Sampling strategies based on edge selection do not perform well; simple uniform random node selection performs surprisingly well. Overall, best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SDM, 2004.
 
4
X. A. Dimitropoulos and G. F. Riley. Creating realistic BGP models. IEEE/ACM MASCOTS, 2003.
5
 
6
 
7
A. C. Gilbert and K. Levchenko. Compressing network graphs. In LinkKDD, 2004.
 
8
V. Krishnamurthy, M. Faloutsos, M. Chrobak, L. Lao, J.-H. Cui, and A. G. Percus. Reducing large internet topologies for faster simulations. In Networking, 2005.
9
 
10
U. of Oregon. Route views project.
11
 
12
D. Rafiei and S. Curial. Effectively visualizing large networks through sampling. In Visualization, 2005.
 
13
M. Richardson, R. Agrawal, and P. Domingos. Trust management for the semantic web. In Second International Semantic Web Conference, 2003.
 
14
M. P. H. Stumpf, C. Wiuf, and R. M. May. Subnets of scale-free networks are not scale-free: Sampling properties of networks. In PNAS, volume 102, 2005.
 
15
D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. Sampling techniques for large, dynamics graphs. In CIS-TR-06-01, University of Oregon, 2006.
 
16
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world'networks. Nature , 393:440--442, 1998.

CITED BY  8

Collaborative Colleagues:
Jure Leskovec: colleagues
Christos Faloutsos: colleagues