ACM Home Page
Please provide us with feedback. Feedback
On unbiased sampling for unstructured peer-to-peer networks
Full text PdfPdf (755 KB)
Source IEEE/ACM Transactions on Networking (TON) archive
Volume 17 ,  Issue 2  (April 2009) table of contents
Pages 377-390  
Year of Publication: 2009
ISSN:1063-6692
Authors
Daniel Stutzbach  Stutzbach Enterprises, LLC, Dallas, TX
Reza Rejaie  Department of Computer Science, University of Oregon, Eugene, OR
Nick Duffield  AT&T Labs--Research, Florham Park, NJ
Subhabrata Sen  AT&T Labs--Research, Florham Park, NJ
Walter Willinger  AT&T Labs--Research, Florham Park, NJ
Publisher
IEEE Press  Piscataway, NJ, USA
Bibliometrics
Downloads (6 Weeks): 37,   Downloads (12 Months): 132,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: 10.1109/TNET.2008.2001730

ABSTRACT

This paper presents a detailed examination of how the dynamic and heterogeneous nature of real-world peer-to-peer systems can introduce bias into the selection of representative samples of peer properties (e.g., degree, link bandwidth, number of files shared). We propose the Metropolized Random Walk with Backtracking (MRWB) as a viable and promising technique for collecting nearly unbiased samples and conduct an extensive simulation study to demonstrate that our technique works well for a wide variety of commonly-encountered peer-to-peer network conditions. We have implemented the MRWB algorithm for selecting peer addresses uniformly at random into a tool called ion-sampler. Using the Gnutella network, we empirically show that ion-sampler. yields more accurate samples than tools that rely on commonly-used sampling techniques and results in dramatic improvements in efficiency and scalability compared to performing a full crawl.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
R. Bhagwan, S. Savage, and G. Voelker, "Understanding availability," presented at the 2003 Int. Workshop on Peer-to-Peer Systems, Berkeley, CA.
5
 
6
S. Chib and E. Greenberg, "Understanding the Metropolis-Hastings algorithm," The Americian Statistician, vol. 49, no. 4, pp. 327-335, Nov. 1995.
 
7
W. Hastings, "Monte carlo sampling methods using Markov chains and their applications," Biometrika, vol. 57, pp. 97-109, 1970.
 
8
N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller, "Equations of state calculations by fast computing machines," J. Chem. Phys., vol. 21, pp. 1087-1092, 1953.
 
9
10
 
11
D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger, "Sampling techniques for large, dynamic graphs," presented at the 2006 Global Internet Symp., Barcelona, Spain, Apr. 2006.
 
12
D. Stutzbach and R. Rejaie, "Capturing accurate snapshots of the Gnutella network," in Proc. 2005 Global Internet Symp., Miami, FL, Mar. 2005, pp. 127-132.
 
13
B. Bollobás, "A probabilistic proof of an asymptotic formula for the number of labelled regular graphs," Eur. J. Combinator., vol. 1, pp. 311-316, 1980.
 
14
 
15
 
16
V. Krishnamurthy, J. Sun, M. Faloutsos, and S. Tauro, "Sampling Internet topologies: How small can we go?," in Proc. 2003 Int. Conf. Internet Computing, Las Vegas, NV, Jun. 2003, pp. 577-580.
 
17
V. Krishnamurthy, M. Faloutsos, M. Chrobak, L. Lao, and J.-H. C. G. Percus, "Reducing large Internet topologies for faster simulations," presented at the 2005 IFIP Networking Conf., Waterloo, Ontario, Canada, May 2005.
 
18
M. P. H. Stumpf, C. Wiuf, and R. M. May, "Subnets of scale-free networks are not scale-free: Sampling properties of networks," Proc. National Academy of Sciences, vol. 102, no. 12, pp. 4221-4224, Mar. 2005.
 
19
 
20
A. Lakhina, J. W. Byers, M. Crovella, and P. Xie, "Sampling biases in IP topology measurements," presented at the IEEE INFOCOM 2003, San Francisco, CA.
21
 
22
P. Rusmevichientong, D. M. Pennock, S. Lawrence, and C. L. Giles, "Methods for sampling pages uniformly from the World Wide Web," in Proc. AAAI Fall Symp. Using Uncertainty Within Computation, 2001, pp. 121-128.
 
23
 
24
L. Lovász, "Random walks on graphs: A survey," Combinatorics: Paul Erdös is Eighty, vol. 2, pp. 1-46, 1993.
25
26
 
27
C. Gkantsidis, M. Mihail, and A. Saberi, "Random walks in peer-to-peer networks," presented at the IEEE INFOCOM 2004, Hong Kong.
 
28
V. Vishnumurthy and P. Francis, "On heterogeneous overlay construction and random node selection in unstructured P2P networks," presented at the IEEE INFOCOM 2006, Barcelona, Spain, Apr. 2006.
 
29
M. Salganik and D. Heckathorn, "Sampling and estimation in hidden populations using respondent-driven sampling," Sociological Methodology , vol. 34, pp. 193-239, 2004.
 
30
D. Heckathorn, "Respondent-driven sampling: a new approach to the study of hidden populations," Social Problems, vol. 44, no. 2, pp. 174-199, 1997.
 
31
L. Goodman, "Snowball sampling," Ann. Math. Statist., vol. 32, pp. 148-170, 1961.
 
32
E. Deaux and J. Callaghan, "Key informant versus self-report estimates of health behavior," Eval. Rev., vol. 9, pp. 365-368, 1985.
 
33
J. Watters and P. Biernack, "Targeted sampling: Options for the study of hidden populations," Annu. Rev. Sociol., vol. 12, pp. 401-429, 1989.
 
34
M. Salganik, "Variance estimation, design effects, and sample size calculation for respondent-driven sampling," J. Urban Health, vol. 83, pp. 98-112, 2006, Suppl. 1.
35
 
36
A. Bonato, "A survey of models of the web graph," Combinatorial and Algorithmic Aspects of Networking, pp. 159-172, 2004.
 
37
J. Pouwelse, P. Garbacki, D. Epema, and H. Sips, "The BitTorrent P2P file-sharing system: Measurements and analysis," presented at the 2005 Int. Workshop on Peer-to-Peer Systems (IPTPS), Ithaca, NY.
 
38
M. Izal, G. Urvoy-Keller, E. W. Biersack, P. A. Felber, A. A. Hamra, and L. Garces-Erice, "Dissecting BitTorrent: Five months in a torrent's lifetime," presented at the Passive and Active Measurement Conf. (PAM), Antibes Juan-les-Pins, France, Apr. 2004.
 
39
40
41
 
42
 
43
J. Li, J. Stribling, F. Kaashoek, R. Morris, and T. Gil, "A performance vs. cost framework for evaluating DHT design tradeoffs under churn," presented at the IEEE INFOCOM 2005, Miami, FL.
 
44
F. E. Bustamante and Y. Qiao, "Friendships that last: Peer lifespan and its role in P2P protocols," in Proc. Int. Workshop on Web Content Caching and Distribution, 2003, p. 2.
 
45
46
47
 
48
H. Dämpfling, Gnutella Web Caching System: Version 2 Specifications Client Developers' Guide. Jun. 2003 [Online]. Available: http://www. gnucleus.com/gwebcache/newgwc.html
 
49
P. Karbhari, M. Ammar, A. Dhamdhere, H. Raj, G. Riley, and E. Zegura, "Bootstrapping in Gnutella: A measurement study," presented at the 2004 Passive and Active Measurement Conf. (PAM) Antibes Juan-les-Pins, France, Apr. 2004.
 
50
R. Srinivasan, Importance Sampling--Application in Communications and Detection. Berlin, Germany: Springer-Verlag, 2002.
 
51
D. Stutzbach and R. Rejaie, "Improving lookup performance over a widely-deployed DHT," presented at the IEEE INFOCOM 2006, Barcelona, Spain, Apr. 2006.

Collaborative Colleagues:
Daniel Stutzbach: colleagues
Reza Rejaie: colleagues
Nick Duffield: colleagues
Subhabrata Sen: colleagues
Walter Willinger: colleagues