Show simple item record

dc.contributor.advisorKim, Taesoo
dc.contributor.authorMaass, Steffen Alexander
dc.date.accessioned2019-08-21T13:50:31Z
dc.date.available2019-08-21T13:50:31Z
dc.date.created2019-08
dc.date.issued2019-05-02
dc.date.submittedAugust 2019
dc.identifier.urihttp://hdl.handle.net/1853/61679
dc.description.abstractLarge-scale internet services, such as Facebook or Google, are using clusters of many servers for problems such as search, machine learning, and social networks. However, while it may be possible to apply the tools used at this scale to smaller, more common problems as well, this dissertation presents approaches to large-scale data processing on only a single machine. This approach has obvious cost benefits and lowers the barrier of entrance to large-scale data processing. This dissertation approaches this problem by redesigning applications to enable trillion-scale graph processing on a single machine while also enabling the processing of evolving, billion-scale graphs. First, this dissertation presents a new out-of-core graph processing engine, called Mosaic, for executing graph algorithms on trillion-scale datasets on a single machine. Mosaic makes use of many-core processors and fast I/O devices coupled with a novel graph encoding scheme to allow processing of graphs of up to one trillion edges on a single machine. Mosaic also employs a locality-preserving, space-filling curve to allow for high compression and high locality when storing graphs and executing algorithms. Our evaluation shows that for smaller graphs, Mosaic consistently outperforms other state-of-the-art out-of-core engines by 3.2x–58.6x and shows comparable performance to distributed graph engines. Furthermore, Mosaic can complete one iteration of the Pagerank algorithm on a trillion-edge graph in 21 minutes, outperforming a distributed disk-based engine by 9.2x. Second, while Mosaic addresses the setting of processing static graph, this dissertation presents Cytom, a new engine for processing billion-scale evolving graphs based on insights about achieving high compression and locality while improving load-balancing when processing a graph that changes rapidly. Cytom introduces a novel programming model that takes advantage of its subgraph-centric approach coupled with the setting of evolving graphs. This is an important enabling step for emerging workloads when processing graphs that change over time. Cytom’s programming model allows algorithm developers to quickly react to graph updates, discarding uninteresting ones while focusing on updates that, in fact, change the algorithmic result. We show that Cytom is effective in scaling to billion-edge graphs, as well as providing higher throughput when updating the graph structure (2.0x–192x) and higher throughput (1.5x–200x) when additionally processing an algorithm.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.publisherGeorgia Institute of Technology
dc.subjectRuntime system
dc.subjectBig data
dc.subjectGraph analytics
dc.subjectPerformance optimization
dc.subjectIncremental processing
dc.subjectHeterogeneous computing
dc.titleSystems abstractions for big data processing on a single machine
dc.typeText
dc.description.degreePh.D.
dc.contributor.departmentComputer Science
thesis.degree.levelDoctoral
dc.contributor.committeeMemberGavrilovska, Ada
dc.contributor.committeeMemberRamachandran, Umakishore
dc.contributor.committeeMemberKrishna, Tushar
dc.contributor.committeeMemberZwaenepoel, Willy
dc.type.genreDissertation
dc.date.updated2019-08-21T13:50:31Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record