Stable and Fault-Tolerant Object Replication

Gregory Johnson and Ambuj Singh

To appear at Nineteenth Annual ACM SIGACT-SIGOPS Symposium on PRINCIPLES OF DISTRIBUTED COMPUTING (PODC 2000), Portland, Oregon, 16-19 July 2000


Support for efficient dynamic migration and replication of objects is essential for achieving adequate performance and scalability. Traditional solutions to the problem focused on competitiveness. This means that the algorithm's complexity matches the offline adversary's complexity within an acceptable ratio. We study the replication problem under two new measures: convergence and fault tolerance. Convergence requires that the algorithm approach the offline adversary's complexity when the object access patterns at the individual nodes of a distributed system stabilize. For fault-tolerance, we consider the performance of replication algorithms when at least t copies need to be maintained. We present a new algorithm based on the idea of sliding windows that is optimally convergent. We also derive a lower bound on the competitiveness of algorithms that always maintain at least t copies. Finally, we modify our earlier solution so that it is optimally fault-tolerant.