Fault Tolerance For Array Architectures (Thesis)
Abstract:
As array architectures become larger, surviving hardware failures
becomes a critical issue. To achieve fault tolerance and increase
reliability, redundancy is added to the system. In this way the
original working architectur e can be reconfigured after a failure is
detected by replacing components. Since in certain run--time
applications reconfiguration should be accomplished as quickly as
possible, the efficiency of reconfiguration of a fault--tolerant
structure becomes a major concern. This dissertation investigates the
relationships among reliability, reconfigurability and the hardware
costs of fault--tolerant structures for different classes of array
architectures. A mathematical framework is presented, and new
measures, {it local} and {it finite reconfigurability,} are defined.
For run--time fault tolerance, local and finite reconfigurability are
desirable properties. It is proved that local reconfigurability and a
fixed level of reliability cannot be achieved simultaneously unless a
dynamic graph is of dimension at least one greater than the
application graph. After this negative result, some highly reliable
and easily reconfigurable structures are constructed based on simple
layered graphs. An on--line distributed reconfiguration algorithm is
presented for finding new bipartite matchings for these constructions.
For run--time error detection in array processors, a methodology based
on data--dependency graphs is described. It combines the projection
method of deriving systolic arrays from dependency graphs with the
idea of input--triggered testing. The method is called {it ITRED,}
for {it Input--driven Time--Redundancy Error Detection.} Tests are
triggered by inserting special symbols in the input, and so the
approach gives the user flexibility in trading off throughput for
error coverage. The method requires no extra {it PE's} and little
extra hardware. To illustrate this approach, an error--detectable
systolic array is designed in CMOS for ALL--SUBSTRING COMPARISONS, a
problem which arises in applications such as information retrieval,
and DNA pattern matching.