Debugging Distributed Programs Using Controlled Re-execution
Neeraj Mittal, Vijay K. Garg
To appear at Nineteenth Annual
ACM SIGACT-SIGOPS Symposium on PRINCIPLES OF DISTRIBUTED COMPUTING (PODC
2000), Portland, Oregon, 16-19 July 2000
Abstract
Distributed programs are hard to write. A distributed debugger equipped
with the mechanism to re-execute the traced computation in a \defn{controlled}
fashion can greatly facilitate the detection and localization of bugs.
This approach gives rise to a general problem, called \defn{predicate control
problem}, which takes a computation and a safety property specified on
the computation, and outputs a controlled computation that maintains the
property. We define a class of global predicates, called \defn{{\myclass}
predicates}, that can be controlled efficiently in a distributed computation.
We prove that the synchronization generated by our algorithm is optimal.
We further introduce the notion of an admissible sequence of events and
prove that it is equivalent to the notion of predicate control. We then
give an efficient algorithm for the class of disjunctive predicates using
this notion.