On the Fault Tolerance of Some Popular Bounded-Degree Networks
Abstract:
In this paper, we analyze the ability of several bounded-degree
networks that are commonly used for parallel computation to tolerate
faults. Among other things, we show that an $N$-node butterfly
containing $N^{1-epsilon}$ worst-case faults (for any constant
$epsilon > 0$) can emulate a fault-free butterfly of the same size
with only constant slowdown. Similar results are proved for the
shuffle-exchange graph. Hence, these networks become the first
connected bounded-degree networks known to be able to sustain more
than a constant number of worst-case faults without suffering more
than a constant-factor slowdown in performance. We also show that an
$N$-node butterfly whose nodes fail with some constant probability $p$
can emulate a fault-free version of itself with a slowdown of
$2^{O(log^* N)}$, which is a very slowly increasing function of $N$.
The proofs of these results combine the technique of redundant
computation with new algorithms for (packet) routing around faults in
hypercubic networks. Techniques for reconfiguring hypercubic networks
around faults that do not rely on redundant computation are also
presented. These techniques are less tolerant of faults but are more
widely applicable since they can be used with other networks such as
binary trees and meshes of trees.