Debugging Distributed Programs Using Controlled Re-execution

Neeraj Mittal, Vijay K. Garg

To appear at Nineteenth Annual ACM SIGACT-SIGOPS Symposium on PRINCIPLES OF DISTRIBUTED COMPUTING (PODC 2000), Portland, Oregon, 16-19 July 2000


Distributed programs are hard to write. A distributed debugger equipped with the mechanism to re-execute the traced computation in a \defn{controlled} fashion can greatly facilitate the detection and localization of bugs. This approach gives rise to a general problem, called \defn{predicate control problem}, which takes a computation and a safety property specified on the computation, and outputs a controlled computation that maintains the property. We define a class of global predicates, called \defn{{\myclass} predicates}, that can be controlled efficiently in a distributed computation. We prove that the synchronization generated by our algorithm is optimal. We further introduce the notion of an admissible sequence of events and prove that it is equivalent to the notion of predicate control. We then give an efficient algorithm for the class of disjunctive predicates using this notion.