|
ABSTRACT
Data replication is a key technique for ensuring data availability. Traditionally, researchers have focused on the availability of individual objects, even though user-level tasks (called operations) typically request multiple objects. Our recent experimental study has shown that the assignment of object replicas to machines results in subtle yet dramatic effects on the availability of these operations, even though the availability of individual objects remains the same. This paper is the first to approach the assignment problem from a theoretical perspective, and obtains a series of results regarding assignments that provide the best and the worst availability for user-level operations. We use a range of techniques to obtain our results, from standard combinatorial techniques and hill climbing methods to Janson's inequality (a strong probabilistic tool). Some of the results demonstrate that even quite simple versions of the assignment problem can have surprising answers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Atul Adya , William J. Bolosky , Miguel Castro , Gerald Cermak , Ronnie Chaiken , John R. Douceur , Jon Howell , Jacob R. Lorch , Marvin Theimer , Roger P. Wattenhofer, Farsite: federated, available, and reliable storage for an incompletely trusted environment, Proceedings of the 5th symposium on Operating systems design and implementation Due to copyright restrictions we are not able to make the PDFs for this conference available for downloading, December 09-11, 2002, Boston, Massachusetts
[doi> 10.1145/1060289.1060291]
|
| |
2
|
N. Alon and J. H. Spencer. The Probabilistic Method. John Wiley & Sons, 2000.
|
 |
3
|
William J. Bolosky , John R. Douceur , David Ely , Marvin Theimer, Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs, Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, p.34-43, June 18-21, 2000, Santa Clara, California, United States
|
 |
4
|
Frank Dabek , M. Frans Kaashoek , David Karger , Robert Morris , Ion Stoica, Wide-area cooperative storage with CFS, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
Larry Huston , Rahul Sukthankar , Rajiv Wickremesinghe , M. Satyanarayanan , Gregory R. Ganger , Erik Riedel , Anastassia Ailamaki, Diamond: A Storage Architecture for Early Discard in Interactive Search, Proceedings of the 3rd USENIX Conference on File and Storage Technologies, March 31-31, 2004, San Francisco, CA
|
| |
9
|
S. Janson. Poisson Approximations for Large Deviations. Random Structures and Algorithms, 1:221--230, 1990.
|
| |
10
|
|
| |
11
|
S. Janson, T. Luczak, and A. Rucinski. An Exponential Bound for the Probability of Nonexistence of a Specified Subgraph in a Random Graph. In Random Graphs'87 (M. Karo'nski and J. Jaworski and A. Ruci'nski, Eds.). John Wiley & Sons, 1990.
|
 |
12
|
David Karger , Eric Lehman , Tom Leighton , Rina Panigrahy , Matthew Levine , Daniel Lewin, Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, p.654-663, May 04-06, 1997, El Paso, Texas, United States
[doi> 10.1145/258533.258660]
|
 |
13
|
|
| |
14
|
|
 |
15
|
John Kubiatowicz , David Bindel , Yan Chen , Steven Czerwinski , Patrick Eaton , Dennis Geels , Ramakrishna Gummadi , Sean Rhea , Hakim Weatherspoon , Chris Wells , Ben Zhao, OceanStore: an architecture for global-scale persistent storage, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.190-201, November 2000, Cambridge, Massachusetts, United States
|
 |
16
|
Sylvia Ratnasamy , Paul Francis , Mark Handley , Richard Karp , Scott Schenker, A scalable content-addressable network, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.161-172, August 2001, San Diego, California, United States
|
| |
17
|
Sylvia Ratnasamy , Brad Karp , Scott Shenker , Deborah Estrin , Ramesh Govindan , Li Yin , Fang Yu, Data-centric storage in sensornets with GHT, a geographic hash table, Mobile Networks and Applications, v.8 n.4, p.427-442, August 2003
[doi> 10.1023/A:1024591915518]
|
| |
18
|
|
 |
19
|
Jose Renato Santos , Richard R. Muntz , Berthier Ribeiro-Neto, Comparing random data allocation and data striping in multimedia servers, Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, p.44-55, June 18-21, 2000, Santa Clara, California, United States
|
 |
20
|
Ion Stoica , Robert Morris , David Karger , M. Frans Kaashoek , Hari Balakrishnan, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.149-160, August 2001, San Diego, California, United States
|
| |
21
|
S. Suen. A Correlation Inequality and a Poisson Limit Theorem for Nonoverlapping Balanced Subgraphs of a Random Graph. Random Structures and Algorithms, 1(2):231--242, 1990.
|
 |
22
|
Alexander S. Szalay , Peter Z. Kunszt , Ani Thakar , Jim Gray , Don Slutz , Robert J. Brunner, Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.451-462, May 15-18, 2000, Dallas, Texas, United States
|
| |
23
|
TPC Benchmark. http://www.tpc.org/.
|
| |
24
|
L. G. Valiant. The Complexity of Enumeration and Reliability Problems. SIAM J. on Computing, 8(3):410--421, 1979.
|
| |
25
|
|
| |
26
|
|
 |
27
|
|
|