Stable and Fault-Tolerant Object Replication
Gregory Johnson and Ambuj Singh
To appear at Nineteenth Annual
ACM SIGACT-SIGOPS Symposium on PRINCIPLES OF DISTRIBUTED COMPUTING (PODC
2000), Portland, Oregon, 16-19 July 2000
Abstract
Support for efficient dynamic migration and replication of objects is essential
for achieving adequate performance and scalability. Traditional solutions
to the problem focused on competitiveness. This means that the algorithm's
complexity matches the offline adversary's complexity within an acceptable
ratio. We study the replication problem under two new measures: convergence
and fault tolerance. Convergence requires that the algorithm approach the
offline adversary's complexity when the object access patterns at the individual
nodes of a distributed system stabilize. For fault-tolerance, we consider
the performance of replication algorithms when at least t copies need to
be maintained. We present a new algorithm based on the idea of sliding
windows that is optimally convergent. We also derive a lower bound on the
competitiveness of algorithms that always maintain at least t copies. Finally,
we modify our earlier solution so that it is optimally fault-tolerant.