Dynamic Hierarchical Caching in Large-Scale Distributed File Systems
Abstract:
Most Distributed File Systems (DFSs) are based on a flat client-server
model in which each client interacts directly with the file server for
all file operations. While this model works well for relatively small
systems in which the file server has adequate capacity for all its
clients, it does not scale to large numbers of clients or systems in
which the clients are connected to the server through low-bandwidth
links. Server traffic can be reduced substantially if clients keep
even a modest-sized cache of previously read files. Intuitively, the
benefits of caching can be increased by organizing clients into a
hierarchy, in which only a small number of machines communicate
directly with the file server, providing intermediate caching services
to machines below them in the hierarchy. While this potentially
reduces server traffic for widely shared files, it can introduce a
significant delay for clients low in the hierarchy for access to files
with a low degree of sharing. This paper describes a simple method for
constructing dynamic hierarchies on a file-by-file basis. The results
of a trace-driven simulation of a dynamic hierarchical filesystem are
presented, yielding a reduction in server traffic of a factor of more
than two for shared files compared with a flat scheme and without a
large increase in client access time. An algorithm to maintain cache
consistency with low overhead by detecting missed cache invalidation
messages is given.