Long-Term Caching Strategies for Very Large Distributed File Systems
Abstract:
This paper examines the feasibility of using long term (disk based)
caches in very large distributed file systems (DFSs). We begin with an
analysis of file access patterns in a distributed Unix workstation
environment, and identify properties of use to the DFS designer. We
then introduce long-term caching strategies that maintain consistency
while dramatically reducing the load on file servers. We describe a
number of algorithms for maintaining client caches, and present the
results of a trace-driven simulation that shows how relatively small
disk-based caches can be used to reduce server traffic by 60% to 90%.
Finally, we outline possible mechanisms for dynamically organizing
these caches into adaptive hierarchies to allow arbitrary scaling of
the number of clients and the use of low-bandwidth communication
networks. A small (2 or 3 level) hierarchy, coupled with smart caching
techniques, has the potential to reduce traffic by an order of
magnitude or more over a flat scheme.