Caching in Large-Scale Distributed File Systems (Thesis)
Abstract:
This thesis examines the problem of cache organization for very
large-scale distributed file systems (DFSs). Conventional DFSs, based
on the client--server model, suffer from bottlenecks when the total
client load exceeds the server's capacity. Previous work has
suggested that hierarchical client organizations can ameliorate the
problem somewhat, but at the expense of a substantial increase in
client latency. An analysis of existing DFS workloads reveals that
there is considerable regularity in client file access patterns and
that widely shared files lend themselves especially well to caching
techniques. In particular, a large proportion of ``cache miss''
traffic is for files that are already copied in another client's
cache. If clients can share these cached files, the server's load can
be reduced by a potentially large margin, making larger-scale systems
possible. We introduce the notion of {em dynamic hierarchical
caching}, in which adaptive client hierarchies are constructed on a
file - by - file basis. Trace - driven simulation and workload -
driven runs of a prototype file system suggest that dynamic
hierarchies can reduce server load substantially without the client
performance penalties associated with more static schemes.