|
ABSTRACT
Given a huge real graph, how can we derive a representative sample? There are many known algorithms to compute interesting measures (shortest paths, centrality, betweenness, etc.), but several of them become impractical for large graphs. Thus graph sampling is essential.The natural questions to ask are (a) which sampling method to use, (b) how small can the sample size be, and (c) how to scale up the measurements of the sample (e.g., the diameter), to get estimates for the large graph. The deeper, underlying question is subtle: how do we measure success?.We answer the above questions, and test our answers by thorough experiments on several, diverse datasets, spanning thousands nodes and edges. We consider several sampling methods, propose novel methods to check the goodness of sampling, and develop a set of scaling laws that describe relations between the properties of the original and the sample.In addition to the theoretical contributions, the practical conclusions from our work are: Sampling strategies based on edge selection do not perform well; simple uniform random node selection performs surprisingly well. Overall, best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SDM, 2004.
|
| |
4
|
X. A. Dimitropoulos and G. F. Riley. Creating realistic BGP models. IEEE/ACM MASCOTS, 2003.
|
 |
5
|
Michalis Faloutsos , Petros Faloutsos , Christos Faloutsos, On power-law relationships of the Internet topology, Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, p.251-262, August 30-September 03, 1999, Cambridge, Massachusetts, United States
|
| |
6
|
|
| |
7
|
A. C. Gilbert and K. Levchenko. Compressing network graphs. In LinkKDD, 2004.
|
| |
8
|
V. Krishnamurthy, M. Faloutsos, M. Chrobak, L. Lao, J.-H. Cui, and A. G. Percus. Reducing large internet topologies for faster simulations. In Networking, 2005.
|
 |
9
|
Jure Leskovec , Jon Kleinberg , Christos Faloutsos, Graphs over time: densification laws, shrinking diameters and possible explanations, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081893]
|
| |
10
|
U. of Oregon. Route views project.
|
 |
11
|
|
| |
12
|
D. Rafiei and S. Curial. Effectively visualizing large networks through sampling. In Visualization, 2005.
|
| |
13
|
M. Richardson, R. Agrawal, and P. Domingos. Trust management for the semantic web. In Second International Semantic Web Conference, 2003.
|
| |
14
|
M. P. H. Stumpf, C. Wiuf, and R. M. May. Subnets of scale-free networks are not scale-free: Sampling properties of networks. In PNAS, volume 102, 2005.
|
| |
15
|
D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. Sampling techniques for large, dynamics graphs. In CIS-TR-06-01, University of Oregon, 2006.
|
| |
16
|
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world'networks. Nature , 393:440--442, 1998.
|
CITED BY 8
|
|
|
|
|
|
|
|
|
|
|
Vaishnavi Krishnamurthy , Michalis Faloutsos , Marek Chrobak , Jun-Hong Cui , Li Lao , Allon G. Percus, Sampling large Internet topologies for simulation purposes, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.51 n.15, p.4284-4302, October, 2007
|
|
|
Hyunwoo Chun , Haewoon Kwak , Young-Ho Eom , Yong-Yeol Ahn , Sue Moon , Hawoong Jeong, Comparison of online social relations in volume vs interaction: a case study of cyworld, Proceedings of the 8th ACM SIGCOMM conference on Internet measurement, October 20-22, 2008, Vouliagmeni, Greece
|
|
|
Xiaolin Shi , Matthew Bonner , Lada A. Adamic , Anna C. Gilbert, The very small world of the well-connected, Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, June 19-21, 2008, Pittsburgh, PA, USA
|
|
|
|
|
|
|
|