Computer Science Thesis Proposal

Monday, April 30, 2018 - 11:00am

Location:

1507 Newell-Simon Hall

Speaker:

HUANCHEN ZHANG, Ph.D. Student http://www.cs.cmu.edu/~huanche1

This thesis seeks to address the challenge of building space-efficient yet high- performance in-memory search structures, including indexes and filters, to allow more efficient use of memory in OLTP databases. We show that we can achieve this goal by first designing fast static structures that leverage succinct data structures to approach the information-theoretic optimum in space, and then using the "hybrid index" architecture to obtain dynamicity with bounded and modest cost in space and performance.

To obtain space-efficient yet high-performance static data structures, we first in- troduce the Dynamic-to-Static rules that present a systematic way to convert existing dynamic structures to smaller immutable versions. We then present the Fast Succinct Trie (FST) and its application, the Succinct Range Filter (SuRF), to show how to leverage theories on succinct data structures to build static search structures that consume space close to the information-theoretic minimum while performing comparably to uncompressed indexes. To support dynamic operations such as inserts, deletes, and updates, we introduce the dual-stage hybrid index architecture that preserves the space efficiency brought by a compressed static index, while amortizing its performance overhead on dynamic operations by applying modifications in batches.

In the proposed work, we seek opportunities to further shrink the size of in-memory indexes by co-designing the indexes with the in-memory tuple storage. We also propose to complete the hybrid index work by extending the techniques to support concurrent indexes.

Thesis Committee:
David G. Andersen (Chair)
Michael Kaminsky
Andrew Pavlo
Kimberly Keeton (Hewlett Packard Enterprise)

Copy of Thesis Proposal Summary

For More Information, Contact:

Keywords:

Speaking Skills