The ACM-EATCS Edsger W. Dijkstra Prize in Distributed Computing was created to acknowledge outstanding papers on the principles of distributed computing whose significance and impact on the theory or practice of distributed computing have been evident for at least a decade. The prize is sponsored jointly by the ACM Symposium on Principles of Distributed Computing (PODC) and the EATCS Symposium on Distributed Computing (DISC). This award is presented annually, with the presentation taking place alternately at PODC and DISC, and will be presented at PODC in 2010.
- Tushar D. Chandra and Sam Toueg. Unreliable Failure Detectors for Reliable Distributed Systems, Journal of the ACM, 43(2):225-267, 1996. (The first version appearing in the Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, 1991.)
- Tushar D. Chandra, Vassos Hadzilacos and Sam Toueg. The Weakest Failure Detector for Solving Consensus, Journal of the ACM, 43(4):685-722, 1996. (The first version appearing in the Proceedings of the 11th ACM Symposium on Principles of Distributed Computing, 1992.)
This pair of papers has had a deep impact on research in distributed computing. The work introduces and defines the notion of (unreliable) failure detectors in a distributed system, establishing a theory of failure detectors grounded on a general and precise framework. These papers have greatly influenced how one can reason about and deal with failures in distributed systems.
A failure detector is defined as an abstraction that provides each process with information about failures. The information may have different levels of accuracy and completeness, leading to different failure detectors, each precisely defined by properties relating the pattern of actual failures to the information provided by the failure detector. This modular approach separates definition from use and implementation, and leads to an elegant framework for developing algorithms and failure detector implementations, and for understanding the failure information and synchrony needed to solve distributed problems. The work has identified, in particular, the weakest failure detector for solving the consensus problem in an asynchronous system subject to failures.
This work has been very influential. It has led to the development of new failure detector-based protocols for solving consensus; to the definition of new failure detectors to tackle other distributed computing problems; to the study of various weak forms of partial synchrony as a means to obtain failure information; to the specification of the quality-of-service of failure detection; to the formulation of lower bounds of failure information needed to solve problems; and to a better understanding of the inherent hardness of solving distributed computing problems in the presence of failures. The failure detector approach has been adopted by the community to tackle many distributed problems, including group membership, atomic commit, various broadcast types, reliable communication, set agreement, mutual exclusion, transactional-memory contention management, leader election, and dining philosophers.
To summarize, this pioneering work introduces a general and elegant framework and theory for failure detectors; it establishes failure detectors as a first-class abstraction; and it proposes an approach to investigate problems based on this abstraction. The proposed approach has broadened the scope of research in distributed computing, leading to the study and understanding of an impressive variety of fundamental problems in distributed systems subject to failures.
2010 Dijkstra Prize in Distributed Computing Award Committee: