Understanding Application Performance on Shared Virtual Memory Systems
Abstract:
Many researchers have proposed interesting protocols for shared
virtual memory (SVM) systems, and demonstrated
performance improvements on parallel programs. However, there is
still no clear understanding of the performance potential of
SVM systems for different classes of applications.
This paper begins to fill this gap, by studying the performance of
a range of applications in detail and understanding it
in light of application characteristics.
We first develop a brief classification of the inherent
data sharing patterns in the applications, and how they interact with
system granularities to yield the communication patterns relevant to
SVM systems. We then use detailed simulation to compare
the performance of two SVM approaches---Lazy Released Consistency
(LRC) and Automatic
Update Release Consistency (AURC)---with each other and with an
all-hardware CC-NUMA approach. We examine how performance is affected
by problem size, machine size, key system parameters, and the use of
less optimized program implementations. We find that SVM can
indeed perform quite well for systems of at least up to 32 processors
for several nontrivial applications. However, performance is much
more variable across applications than on CC-NUMA systems, and the
problem sizes needed to obtain good parallel performance are
substantially larger. The hardware-assisted AURC system tends to
perform significantly better than the all-software LRC under our
system assumptions, particularly when realistic cache hierarchies are
used.
- This technical report has been published as
- Understanding Application Performance on Shared Virtual
Memory. Liviu Iftode, Jaswinder Pal Singh and Kai Li,
Proc. of the 23rd Annual Internat. Symposium on
Computer Architecture, May 1996, pp. 122-133.