Show simple item record

dc.contributor.authorDiamos, Gregory Frederick
dc.contributor.authorWu, Haicheng
dc.contributor.authorLele, Ashwin
dc.contributor.authorWang, Jin
dc.date.accessioned2013-10-18T12:44:24Z
dc.date.available2013-10-18T12:44:24Z
dc.date.issued2012-02-01
dc.identifier.urihttp://hdl.handle.net/1853/49227
dc.description.abstractRelational databases remain an important application domain for organizing and analyzing the massive volume of data generated as sensor technology, retail and inventory transactions, social media, computer vision, and new fields continue to evolve. At the same time, processor architectures are beginning to shift towards hierarchical and parallel architectures employing throughput-optimized memory systems, lightweight multi-threading, and Single-Instruction Multiple-Data (SIMD) core organizations. Examples include general purpose graphics processing units (GPUs) such as NVIDIA’s Fermi, Intels Sandy Bridge, and AMD’s Fusion processors. This paper explores the mapping of primitive relational algebra operations onto GPUs. In particular, we focus on algorithms and data structure design identifying a fundamental conflict between the structure of algorithms with good computational complexity and that of algorithms with memory access patterns and instruction schedules that achieve peak machine utilization. To reconcile this conflict, our design space exploration converges on a hybrid multi-stage algorithm that devotes a small amount of the total runtime to prune input data sets using an irregular algorithm with good computational complexity. The partial results are then fed into a regular algorithm that achieves near peak machine utilization. The design process leading to the most efficient algorithm for each stage is described, detailing alternative implementations, their performance characteristics, and an explanation of why they were ultimately abandoned. The least efficient algorithm (JOIN) achieves 57% 􀀀 72% of peak machine performance depending on the density of the input. The most efficient algorithms (PRODUCT, PROJECT, and SELECT) achieve 86% 􀀀 92% of peak machine performance across all input data sets. To the best of our knowledge, these represent the best known published results to date for any implementations. This work lays the foundation for the development of a relational database system that achieves good scalability on a Multi-level Bulk- Synchronous-Parallel (Multi-BSP) processor architecture exemplified by modern GPUs.en_US
dc.language.isoen_USen_US
dc.publisherGeorgia Institute of Technologyen_US
dc.relation.ispartofseriesCERCS ; GIT-CERCS-12-01en_US
dc.subjectData structureen_US
dc.subjectDatabaseen_US
dc.subjectGPUen_US
dc.subjectPrimitiveen_US
dc.titleEfficient Relational Algebra Algorithms and Data Structures for GPUen_US
dc.typeTechnical Reporten_US
dc.contributor.corporatenameGeorgia Institute of Technology. College of Computingen_US
dc.contributor.corporatenameGeorgia Institute of Technology. Center for Experimental Research in Computer Systemsen_US
dc.contributor.corporatenamenVIDIA Corporationen_US
dc.embargo.termsnullen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record