Shared Virtual Memory Across SMP Nodes Using Automatic Update: Protocols and Performance
Abstract:
As the workstation market moves form single processor to small-scale
shared memory multiprocessors, it is very attractive to construct
larger-scale multiprocessors by connecting widely available symmetric
multiprocessors (SMPs) in a less tightly coupled way. Using a shared
virtual memory (SVM) layer for this purpose preserves the shared
memory programming abstraction across nodes. We explore the
feasibility and performance implications of one such approach by
extending the AURC (Automatic Update Release Consistency) protocol,
used in the SHRIMP multicomputer, to connect hardware-coherent SMPs
rather than uniprocessors. We describe the extended AURC protocol,
and compare its performance with both the AURC uniprocessor node case
as
well as with an all-software Lazy Release Consistency (LRC) protocol
extended for SMPs. We present results based on detailed simulations
of two protocols (AURC and LRC) and two architectural configurations
of a system with 16 processors; one with one processor per node (16
nodes) and one with four processors per node (4 nodes).
We find that, unless the bandwidth of the network interface is
increased, the network interface becomes the bottleneck in a clustered
architecture especially for AURC. While a LRC protocol can benefit
from the reduction in per processor communication in a
clustered architecture, the write-through traffic in AURC
increases significantly the communication demands per network
interface. This causes more traffic contention and either prevents the
performance of AURC from improving under SMP or hurts it severely for
applications with significant communication requirements. Thus, while
AURC performs better than LRC, for applications with high
communication needs, the reverse may be true in clustered
architectures. Among possible solutions, two are investigated in the
paper: protocol changes and bandwidth increases. Further work is
clearly needed on the systems and application sides to evaluate
whether AURC can be extended for multiprocessor node systems.