|
ABSTRACT
This paper presents an algorithm by which a process in a distributed system determines a global state of the system during a computation. Many problems in distributed systems can be cast in terms of the problem of detecting global states. For instance, the global state detection algorithm helps to solve an important class of problems: stable property detection. A stable property is one that persists: once a stable property becomes true it remains true thereafter. Examples of stable properties are “computation has terminated,” “ the system is deadlocked” and “all tokens in a token ring have disappeared.” The stable property detection problem is that of devising algorithms to detect a given stable property. Global state detection can also be used for checkpointing.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
DIJKSTRA, E.W. The distributed snapshot of K. M. Chandy and L. Lamport. Tech. Rep. EWD 864a, Univ. of Texas, Austin, Tex., 1984.
|
| |
4
|
DIJKSTRA, E. W., AND SCHOLTEN, C.S. Termination detection for diffusing computations. Inf. Proc. Lett. 11, 1 (Aug. 1980), 1-4.
|
| |
5
|
GLIGOR, V. D., AND SHATTUCK, S.H. Deadlock detection in distributed systems. IEEE Trans. Softw. Eng. SE-6, 5 (Sep. 1980), 435-440.
|
 |
6
|
|
| |
7
|
LAMPORT, L., AND CHANDY, K.M. On partially-ordered event models of distributed computations. Submitted for publication.
|
| |
8
|
MAHOUD, S. A., AND RIORDAN, J. S. Software controlled access to distributed databases. INFOR 15, 1 (Feb. 1977), 22-36.
|
| |
9
|
MENASCE, D., AND MUNTZ, R. Locking and deadlock detection in distributed data bases. IEEE Trans. Softw. Eng. SE-5, 3 (May 1979), 195-202.
|
 |
10
|
|
 |
11
|
|
CITED BY 333
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dalia Malki , Ken Birman , Aleta Ricciardi , André Schiper, Uniform actions in asynchronous distributed systems, Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing, p.274-283, August 14-17, 1994, Los Angeles, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Paul F. Reynolds, Jr. , Carmen M. Pancerella , Sudhir Srinivasan, Making parallel simulations go fast, Proceedings of the 24th conference on Winter simulation, p.646-656, December 13-16, 1992, Arlington, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yehuda Afek , Danny Dolev , Hagit Attiya , Eli Gafni , Michael Merritt , Nir Shavit, Atomic snapshots of shared memory, Proceedings of the ninth annual ACM symposium on Principles of distributed computing, p.1-13, August 22-24, 1990, Quebec City, Quebec, Canada
|
|
|
Nuno Neves , Miguel Castro , Paulo Guedes, A checkpoint protocol for an entry consistent shared memory system, Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing, p.121-129, August 14-17, 1994, Los Angeles, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eli Gafni , Michael Merritt , Gadi Taubenfeld, The concurrency hierarchy, and algorithms for unbounded concurrency, Proceedings of the twentieth annual ACM symposium on Principles of distributed computing, p.161-169, August 2001, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
|
|
|
Manhoi Choy , Hong V. Leong , Man Hon Wong, On distributed object checkpointing and recovery, Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing, p.64-73, August 20-23, 1995, Ottowa, Ontario, Canada
|
|
|
|
|
|
Danny Dolev , Eli Gafni , Nir Shavit, Toward a non-atomic era: l-exclusion as a test case, Proceedings of the twentieth annual ACM symposium on Theory of computing, p.78-92, May 02-04, 1988, Chicago, Illinois, United States
|
|
|
Yehuda Afek , Eli Gafni , Adi Rosén, The slide mechanism with applications in dynamic networks, Proceedings of the eleventh annual ACM symposium on Principles of distributed computing, p.35-46, August 10-12, 1992, Vancouver, British Columbia, Canada
|
|
|
|
|
|
Vijay A. Saraswat , Kenneth Kahn , David Weinbaum, Detecting stable properties of networks in concurrent logic programming languages, Proceedings of the seventh annual ACM Symposium on Principles of distributed computing, p.210-222, August 15-17, 1988, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
Alan Demers , Dan Greene , Carl Hauser , Wes Irish , John Larson , Scott Shenker , Howard Sturgis , Dan Swinehart , Doug Terry, Epidemic algorithms for replicated database maintenance, Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, p.1-12, August 10-12, 1987, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anish Arora , Sandeep Kulkarni , Murat Demirbas, Resettable vector clocks, Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing, p.269-278, July 16-19, 2000, Portland, Oregon, United States
|
|
|
|
|
|
|
|
|
|
|
|
Roberto Baldoni , Jean-Michel Hélary , Michel Raynal, Rollback-dependency trackability: visible characterizations, Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing, p.33-42, May 04-06, 1999, Atlanta, Georgia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yehuda Afek , Gad M. Landau , Baruch Schieber , Moti Yung, The power of multimedia: combining point-to point and multi-access networks, Proceedings of the seventh annual ACM Symposium on Principles of distributed computing, p.90-104, August 15-17, 1988, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Greg Bronevetsky , Daniel Marques , Keshav Pingali , Paul Stodghill, Collective operations in application-level fault-tolerant MPI, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
|
|
|
|
|
|
Amy L. Murphy , Gruia-Catalin Roman , George Varghese, An algorithm for message delivery to mobile units, Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing, p.292, August 21-24, 1997, Santa Barbara, California, United States
|
|
|
|
|
|
Christoper H. Young , Radharamanan Radhakrishnan , Philip A. Wilsey, Optimism: not just for event execution anymore, Proceedings of the thirteenth workshop on Parallel and distributed simulation, p.136-143, May 01-04, 1999, Atlanta, Georgia, United States
|
|
|
|
|
|
Robert H. B. Netzer , Timothy W. Brennan , Suresh K. Damodaran-Kamal, Debugging race conditions in message-passing programs, Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, p.31-40, May 22-23, 1996, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haim Gaifman , Michael J. Maher , Ehud Shapiro, Replay, recovery, replication, and snapshots of nondeterministic concurrent programs, Proceedings of the tenth annual ACM symposium on Principles of distributed computing, p.241-255, August 19-21, 1991, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jean-Michael Helary , Claude Jard , Noël Plouzeau , Michel Raynal, Detection of stable properties in distributed applications, Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, p.125-136, August 10-12, 1987, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
Yehuda Afek , Hagit Attiya , Danny Dolev , Eli Gafni , Michael Merritt , Nir Shavit, Atomic snapshots of shared memory, Journal of the ACM (JACM), v.40 n.4, p.873-890, Sept. 1993
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David W. Miller , Jinhua Guo , Eileen Kraemer , Yin Xiong, On-the-fly calculation and verification of consistent steering transactions, Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), p.1-17, November 10-16, 2001, Denver, Colorado
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
R. Kordale , M. Ahamad , M. Devarakonda, Object caching in a CORBA compliant system, Proceedings of the 2nd conference on USENIX Conference on Object-Oriented Technologies (COOTS), p.6-6, June 17-21, 1996, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Camille Coti , Thomas Herault , Pierre Lemarinier , Laurence Pilard , Ala Rezmerita , Eric Rodriguez , Franck Cappello, MPI tools and performance studies---Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
|
|
|
Ramakrishna Gummadi , Nupur Kothari , Todd Millstein , Ramesh Govindan, Declarative failure recovery for sensor networks, Proceedings of the 6th international conference on Aspect-oriented software development, March 12-16, 2007, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ranganath Atreya , Neeraj Mittal , Ajay D. Kshemkalyani , Vijay K. Garg , Mukesh Singhal, Efficient detection of a locally stable predicate in a distributed system, Journal of Parallel and Distributed Computing, v.67 n.4, p.369-385, April, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bouteiller Bouteiller , Franck Cappello , Thomas Herault , Krawezik Krawezik , Pierre Lemarinier , Magniette Magniette, MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.25, November 15-21, 2003
|
|
|
Martin Schulz , Greg Bronevetsky , Rohit Fernandes , Daniel Marques , Keshav Pingali , Paul Stodghill, Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs, Proceedings of the 2004 ACM/IEEE conference on Supercomputing, p.38, November 06-12, 2004
|
|
|
|
|
|
Alan Demers , Dan Greene , Carl Houser , Wes Irish , John Larson , Scott Shenker , Howard Sturgis , Dan Swinehart , Doug Terry, Epidemic algorithms for replicated database maintenance, ACM SIGOPS Operating Systems Review, v.22 n.1, p.8-32, Jan., 1988
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
John P. John , Ethan Katz-Bassett , Arvind Krishnamurthy , Thomas Anderson , Arun Venkataramani, Consensus routing: the internet as a distributed system, Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, p.351-364, April 16-18, 2008, San Francisco, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ardalan Kangarlou , Dongyan Xu , Paul Ruth , Patrick Eugster, Taking snapshots of virtual networked environments, Proceedings of the 3rd international workshop on Virtualization technology in distributed computing, p.1-8, November 12-12, 2007, Reno, Nevada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andreas Haeberlen , Ioannis Avramopoulos , Jennifer Rexford , Peter Druschel, NetReview: detecting when interdomain routing goes wrong, Proceedings of the 6th USENIX symposium on Networked systems design and implementation, p.437-452, April 22-24, 2009, Boston, Massachusetts
|
|
|
Maysam Yabandeh , Nikola Knezevic , Dejan Kostic , Viktor Kuncak, CrystalBall: predicting and preventing inconsistencies in deployed distributed systems, Proceedings of the 6th USENIX symposium on Networked systems design and implementation, p.229-244, April 22-24, 2009, Boston, Massachusetts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Long Vu , Indranil Gupta , Jin Liang , Klara Nahrstedt, Measurement and modeling of a large-scale overlay for multimedia streaming, The Fourth International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness & Workshops, August 14-17, 2007, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Franck Cappello , Al Geist , Bill Gropp , Laxmikant Kale , Bill Kramer , Marc Snir, Toward Exascale Resilience, International Journal of High Performance Computing Applications, v.23 n.4, p.374-388, November 2009
|
|
|
Dong H. Ahn , Bronis R. de Supinski , Ignacio Laguna , Gregory L. Lee , Ben Liblit , Barton P. Miller , Martin Schulz, Scalable temporal order analysis for large scale debugging, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, November 14-20, 2009, Portland, Oregon
|
|
|
|
|
|
|
|
|
|
|
|
Tamim Sookoor , Timothy Hnat , Pieter Hooimeijer , Westley Weimer , Kamin Whitehouse, Macrodebugging: global views of distributed program execution, Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems, November 04-06, 2009, Berkeley, California
|
|