A high-performance framework for analyzing massive complex networks
MetadataShow full item record
Graphs are a fundamental and widely-used abstraction for representing data. We can analytically study interesting aspects of real-world complex systems such as the Internet, social systems, transportation networks, and biological interaction data by modeling them as graphs. Graph-theoretic and combinatorial problems are also pervasive in scientific computing and engineering applications. In this dissertation, we address the problem of analyzing large-scale complex networks that represent interactions between hundreds of thousands to billions of entities. We present SNAP, a new high-performance computational framework for efficiently processing graph-theoretic queries on massive datasets. Graph analysis is computationally very different from traditional scientific computing, and solving massive graph-theoretic problems on current high performance computing systems is challenging due to several reasons. First, real-world graphs are often characterized by a low diameter and unbalanced degree distributions, and are difficult to partition on parallel systems. Second, parallel algorithms for solving graph-theoretic problems are typically memory intensive, and the memory accesses are fine-grained and highly irregular. The primary contributions of this dissertation are the design and implementation of novel parallel graph algorithms for traversal, shortest paths, and centrality computations, optimized for the small-world network topology, and high-performance multithreaded architectures and multicore servers. SNAP (Small-world Network Analysis and Partitioning) is a modular, open-source framework for the exploratory analysis and partitioning of large-scale networks. With SNAP, we demonstrate the capability to process massive graphs with billions of vertices and edges, and achieve up to two orders of magnitude speedup over state-of-the-art network analysis approaches. We also design a new parallel computing benchmark for characterizing the performance of graph-theoretic problems on high-end systems; study data representations for dynamic graph problems on parallel systems; and apply algorithms in SNAP to solve real-world problems in social network analysis and systems biology.