Overcoming Memory Capacity Constraints for Large Graph Applications on GPUs
MetadataShow full item record
Graphics Processing Units (GPUs) have been used successfully for accelerating a wide variety of applications in the domains of scientific computing, machine learning, and data analytics over the last decade. Two important trends that have emerged across these domains are that for a lot of real-world problems, the working sets are larger than a GPU’s memory capacity, and that the data is sparse. In this dissertation, we focus on graph analytics, for which the majority of prior work has been restricted to graphs of modest sizes that fit in memory. Real world graphs such as social networks and web graphs require tens to hundreds of gigabytes of storage whereas GPU memory is typically in the order of a few gigabytes. We investigate the following question: How can we accelerate graph applications on GPUs when the graphs do not fit in memory? This question opens up two lines of inquiry. First, we consider the system architecture where the GPU can address larger, albeit slower, host memory that is behind an interconnect such as PCI-e. While this increases the total addressable memory, graph applications have poor locality that makes efficient use of this architecture challenging. We formulate the locality problem as a graph ordering problem and propose efficient reordering methods for large graphs. The solution is general enough that it can be extended to other similar architectures and even beneficial in cases where graphs fit in memory. Second, we consider graph compression as a complementary approach. Conventional approaches to graph compression have been CPU-centric in that the decompression stage is sequential and branch intensive. We devise an efficient graph compression format for large sparse graphs that is amenable to run-time decompression on GPUs. The decompression stage is parallel and load balanced so that it can use the computational resources of GPUs effectively. In effect, analytics kernels can decompress the graph and do computation on the fly without decompressing the entire graph since the entire decompressed graph would not fit in memory.