Show simple item record

dc.contributor.advisorSchwan, Karsten
dc.contributor.authorAmur, Hrishikesh
dc.date.accessioned2014-01-13T16:53:23Z
dc.date.available2014-01-13T16:53:23Z
dc.date.created2013-12
dc.date.issued2013-11-18
dc.date.submittedDecember 2013
dc.identifier.urihttp://hdl.handle.net/1853/50397
dc.description.abstractComputing in the last decade has been characterized by the rise of data- intensive scalable computing (DISC) systems. In particular, recent years have wit- nessed a rapid growth in the popularity of fast analytics systems. These systems exemplify a trend where queries that previously involved batch-processing (e.g., run- ning a MapReduce job) on a massive amount of data, are increasingly expected to be answered in near real-time with low latency. This dissertation addresses the problem that existing designs for various components used in the software stack for DISC sys- tems do not meet the requirements demanded by fast analytics applications. In this work, we focus specifically on two components: 1. Key-value storage: Recent work has focused primarily on supporting reads with high throughput and low latency. However, fast analytics applications require that new data entering the system (e.g., new web-pages crawled, currently trend- ing topics) be quickly made available to queries and analysis codes. This means that along with supporting reads efficiently, these systems must also support writes with high throughput, which current systems fail to do. In the first part of this work, we solve this problem by proposing a new key-value storage system – called the WriteBuffer (WB) Tree – that provides up to 30× higher write per- formance and similar read performance compared to current high-performance systems. 2. GroupBy-Aggregate: Fast analytics systems require support for fast, incre- mental aggregation of data for with low-latency access to results. Existing techniques are memory-inefficient and do not support incremental aggregation efficiently when aggregate data overflows to disk. In the second part of this dis- sertation, we propose a new data structure called the Compressed Buffer Tree (CBT) to implement memory-efficient in-memory aggregation. We also show how the WB Tree can be modified to support efficient disk-based aggregation.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.publisherGeorgia Institute of Technology
dc.subjectKey-value storage
dc.subjectGroupBy-Aggregate
dc.subjectResource efficiency
dc.subjectWrite-optimized data structures
dc.subject.lcshReal-time data processing
dc.subject.lcshHigh performance computing
dc.titleStorage and aggregation for fast analytics systems
dc.typeDissertation
dc.description.degreePh.D.
dc.contributor.departmentComputer Science
thesis.degree.levelDoctoral
dc.contributor.committeeMemberAndersen, David G.
dc.contributor.committeeMemberGanger, Gregory R.
dc.contributor.committeeMemberGavrilovska, Ada
dc.contributor.committeeMemberWolf, Matthew
dc.contributor.committeeMemberVuduc, Richard
dc.date.updated2014-01-13T16:53:28Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record