Systems and Algorithms for High-Performance, Cost-Efficient Key-Value Storage
Abstract:
Key-value storage systems are increasingly essential building blocks of modern cloud and
big data applications. The workloads these systems support often require random access
to small objects over massive datasets with highly skewed and dynamic key popularity. It
is challenging for a storage cluster to serve these workloads with both high performance
and low-cost operations. Today’s systems usually sacrifice one for the other. In this dissertation,
we present novel approaches to improve both the performance and cost-efficiency
of key-value systems by combining new hardware and software techniques with careful
architectural design and algorithmic optimizations.
First, at cluster scale, we build SwitchKV, a heterogeneous system that uses small highend
cache nodes to guarantee load balancing across many SSD-based backend nodes under
nearly-arbitrary workloads. The cache nodes absorb the hottest queries so that no individual
backend node is over-burdened or underutilized. The system exploits OpenFlow switches
to enable efficient content-aware routing so that it can achieve scalable high throughput,
low tail latency, and high availability. It uses new algorithms to keep the cache and switch
forwarding rules updated with low overhead, and to ensure stable high performance under
rapidly changing workloads. SwitchKV can meet the service level objectives for many
cloud services more efficiently than traditional systems.
Second, to improve the efficiency of each individual multi-core server, we build a highthroughput
and memory-efficient concurrent hash table based around optimistic cuckoo
hashing. Our re-design minimizes critical section length, reduces interprocessor coherence
traffic, and enables effective prefetching through careful algorithm and data structure
engineering. We explore hardware transactional memory and fine-grained locking for concurrency
control, and find that both of them require the same level of algorithmic efforts
to achieve high performance. Our new hash table design greatly outperforms other optimized
concurrent hash tables for both read- and write-heavy workloads, even while using
substantially less memory for small key-value items