ACM Home Page
Please provide us with feedback. Feedback
Minimizing churn in distributed systems
Full text PdfPdf (358 KB)
Source Applications, Technologies, Architectures, and Protocols for Computer Communication archive
Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications table of contents
Pisa, Italy
SESSION: Analysis table of contents
Pages: 147 - 158  
Year of Publication: 2006
ISBN:1-59593-308-5
Also published in ...
Authors
P. Brighten Godfrey  UC Berkeley
Scott Shenker  UC Berkeley
Ion Stoica  UC Berkeley
Sponsors
SIGCOMM: ACM Special Interest Group on Data Communication
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 148,   Citation Count: 16
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1159913.1159931
What is a DOI?

ABSTRACT

A pervasive requirement of distributed systems is to deal with churn-change in the set of participating nodes due to joins, graceful leaves, and failures. A high churn rate can increase costs or decrease service quality. This paper studies how to reduce churn by selecting which subset of a set of available nodes to use.First, we provide a comparison of the performance of a range of different node selection strategies in five real-world traces. Among our findings is that the simple strategy of picking a uniform-random replacement whenever a node fails performs surprisingly well. We explain its performance through analysis in a stochastic model.Second, we show that a class of strategies, which we call "Preference List" strategies, arise commonly as a result of optimizing for a metric other than churn, and produce high churn relative to more randomized strategies under realistic node failure patterns. Using this insight, we demonstrate and explain differences in performance for designs that incorporate varying degrees of randomization. We give examples from a variety of protocols, including anycast, over-lay multicast, and distributed hash tables. In many cases, simply adding some randomization can go a long way towards reducing churn.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
B. C. Arnold. Majorization and the Lorenz order: A Brief Introduction, volume 43. Lecture Notes in Statistics, Springer-Verlag, 1987.
 
2
M. Bakkaloglu, J. J. Wylie, C. Wang, and G. R. Ganger. On correlated failures in survivable storage systems. Technical Report CMU-CS-02-129, Carnegie Mellon University, May 2002.
3
 
4
A. Bavier, M. Bowman, B. Chun, D. Culler, S. Karlin, S. Muir, L. Peterson, T. Roscoe, T. Spalink, and M. Wawrzoniak. Operating system support for planetary-scale network services. In Proc. NSDI, 2004.
 
5
R. Bhagwan, K. Tati, Y.-C. Cheng, S. Savage, and G. M. Voelker. Total recall: System support for automated availability management. In NSDI, 2004.
 
6
C. Blake and R. Rodrigues. High availability, scalable storage, dynamic peer neetworks: Pick two. In Proc. HOTOS, May 2003.
7
8
9
 
10
11
 
12
M. J. Freedman, K. Lakshminarayanan, and D. Mazieres. Oasis: Anycast for any service. In NSDI, 2006.
 
13
L. Garces-Erice, E. W. Biersack, K. W. Ross, P. A. Felber, and G. Urvoy-Keller. Hierarchical P2P systems. In Proc. ACM/IFIP International Conference on Parallel and Distributed Computing (Euro-Par), 2003.
 
14
P. B. Godfrey, S. Shenker, and I. Stoica. Minimizing churn in distributed systems. Technical Report EECS-2006-25, EECS Department, University of California, Berkeley, 2006.
 
15
S. Guha, N. Daswani, and R. Jain. An experimental study of the Skype peer-to-peer VoIP system. In IPTPS, 2006.
16
 
17
M. F. Kaashoek and D. R. Karger. Koorde: A simple degree-optimal distributed hash table. In Proc. IPTPS, 2003.
 
18
J. Ledlie, J. Shneidman, M. Amis, M. Mitzenmacher, and M. Seltzer. Reliability- and capacity-based selection in distributed hash tables. Technical report, Harvard University Computer Science, Sept. 2003.
19
 
20
J. Li, J. Stribling, R. Morris, and M. F. Kaashoek. Bandwidth-efficient management of DHT routing tables. In Proc. NSDI, 2005.
 
21
J. Li, J. Stribling, R. Morris, M. F. Kaashoek, and T. M. Gil. A performance vs. cost framework for evaluating DHT design tradeoffs under churn. In Proc. INFOCOM, 2005.
 
22
G. S. Manku, M. Bawa, and P. Raghavan. Symphony: distributed hashing in a small world. In USENIX Symposium on Internet Technologies and Systems, 2003.
23
24
25
26
 
27
S. Rhea, B.-G. Chun, J. Kubiatowicz, and S. Shenker. Fixing the embarrassing slowness of OpenDHT on PlanetLab. In Proc. WORLDS, 2005.
 
28
 
29
S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz. Handling churn in a DHT. In Proc. USENIX Annual Technical Conference, June 2004.
30
 
31
S. Saroiu, P. K. Gummadi, and S. D. Gribble. A Measurement Study of Peer-to-Peer File Sharing Systems. In Proc. MMCN, 2002.
32
33
34
 
35
J. Stribling. Planetlab all pairs ping. http://infospect.planet-lab.org/pings.
 
36
K. Tati and G. M. Voelker. On object maintenance in peer-to-peer systems. In Proc. IPTPS, 2006.
 
37
H. Weatherspoon, B.-G. Chun, C. W. So, and J. Kubiatowicz. Long-Term Data Maintenance: A Quantitative Approach. Technical Report UCB/CSD-05-1404, EECS Department, University of California, Berkeley, July 2005.
38
 
39
B. Zhang, T. S. Ng, A. Nandi, R. Riedi, P. Druschel, and G. Wang. Measurement-based analysis, modeling, and synthesis of the Internet delay space, 2006. http://www.cs.rice.edu/bozhang/synthesizer.html.

CITED BY  16

Collaborative Colleagues:
P. Brighten Godfrey: colleagues
Scott Shenker: colleagues
Ion Stoica: colleagues