Write-Optimized Indexing for Log-Structured Key-Value Stores
MetadataShow full item record
The recent shift towards write-intensive workload on big data (e.g., financial trading, social user-generated data streams) has pushed the proliferation of the log-structured key-value stores, represented by Google’s BigTable, HBase and Cassandra; these systems optimize write performance by adopting a log-structured merge design. While providing key-based access methods based on a Put/Get interface, these key-value stores do not support value-based access methods, which significantly limits their applicability in many web and Internet applications, such as real-time search for all tweets or blogs containing “government shutdown”. In this paper, we present HINDEX, a write-optimized indexing scheme on the log-structured key-value stores. To index intensively updated big data in real time, the index maintenance is made lightweight by a design tailored to the unique characteristic of the underlying log-structured key-value stores. Concretely, HINDEX performs append-only index updates, which avoids the reading of historic data versions, an expensive operation in the log-structure store. To fix the potentially obsolete index entries, HINDEX proposes an offline index repair process through tight coupling with the routine compactions. HINDEX’s system design is generic to the Put/Get interface; we implemented a prototype of HINDEX based on HBase without internal code modification. Our experiments show that the HINDEX offers significant performance advantage for the write-intensive index maintenance.
- CERCS Technical Reports