Application-Controlled File Caching and Prefetching (Thesis)
Abstract:
As disk performance continues to lag behind microprocessors and memory systems,
the file system is increasingly becoming the bottleneck for many applications.
This dissertation demonstrates that with appropriate mechanisms and policies,
application-controlled file caching and prefetching can significantly improve
the file I/O performance of applications in both single and multiple process
cases.
In traditional file systems, the kernel controls file cache replacement and
prefetching using fixed policies, with no input from user processes. For many
applications, this has
resulted in poor utilization of the file cache and little overlapping between
I/O and computation.
The challenges are to design a scheme that allows applications to control the
management of their file cache, and to design algorithms for the kernel
to coordinate the use of shared resources so that the performance of the
whole system is guaranteed to improve.
This dissertation proposes two-level file cache management: the kernel
allocates physical pages to individual applications, and each application is
responsible for deciding how to use its physical pages. The dissertation
addresses three issues:
a global allocation policy that allows
applications to control their own cache replacement while
maintaining fair allocation of cache blocks among processes,
integrated algorithms for caching and prefetching,
and a low-overhead mechanism to
implement the interactions between user processes and the kernel.
A prototype file system, ACFS, is implemented to
experiment with application-controlled file caching and prefetching on
a suite of I/O intensive applications.
Experiments show that
application-controlled file caching and prefetching combined with
disk scheduling significantly improves
the file I/O performance for applications:
individual applications' running times are reduced by 3% to 49%
(average 26%), and multi-process workloads' running times are reduced by
5% to 76% (average 32%).
Each technique provides substantial performance benefits:
application-controlled file caching reduces the number of disk I/Os,
carefully integrated caching and prefetching increase the overlap between
CPU computation and disk accesses,
and disk scheduling combined with prefetching reduces the average disk
access latency.
The combination of all three techniques provides the best performance.