Building Systems that Query on Compressed Data
In this talk, we present Succinct, a distributed data store that supports functionality comparable to state-of-the-art NoSQL stores and yet, enables query interactivity for an order of magnitude larger data sizes than what is possible today (or, alternatively, up to two orders of magnitude faster queries at scale). Succinct accomplishes this by executing a wide range of queries -- e.g., search, range, and even regular expressions -- directly on compressed data. Succinct achieves scale by storing the input data in a compressed form, and interactivity by avoiding data scans and data decompression. We will also discuss how Succinct’s approach of executing queries on compressed data enables a new “lens” for exploring several classical systems problems -- e.g., failure recovery, load spikes during transient failures, skewed workloads, etc. --, and leads to previously unachievable operating points in the system design space. Succinct is open-sourced, and is already being adopted in production clusters of several large-scale web services.
Rachit Agarwal is a postdoc in AMPLab at UC Berkeley, where he leads the Succinct project along with Ion Stoica. His research focuses on the core problems in distributed data-intensive systems, with the goal of building systems that not only aim for practical impact but also have a strong theoretical foundation. He completed his PhD at UIUC, working with Brighten Godfrey and Matthew Caesar, and his undergraduate from IIT Kanpur. During his PhD, he received 2012 UIUC Rambus research award and 2010 Wang-Chung research award for outstanding performance in computer engineering research, and was listed in 2010 UIUC List of Teachers ranked as excellent.