Scalable big data systems: Architectures and optimizations
MetadataShow full item record
Big data analytics has become not just a popular buzzword but also a strategic direction in information technology for many enterprises and government organizations. Even though many new computing and storage systems have been developed for big data analytics, scalable big data processing has become more and more challenging as a result of the huge and rapidly growing size of real-world data. Dedicated to the development of architectures and optimization techniques for scaling big data processing systems, especially in the era of cloud computing, this dissertation makes three unique contributions. First, it introduces a suite of graph partitioning algorithms that can run much faster than existing data distribution methods and inherently scale to the growth of big data. The main idea of these approaches is to partition a big graph by preserving the core computational data structure as much as possible to maximize intra-server computation and minimize inter-server communication. In addition, it proposes a distributed iterative graph computation framework that effectively utilizes secondary storage to maximize access locality and speed up distributed iterative graph computations. The framework not only considerably reduces memory requirements for iterative graph algorithms but also significantly improves the performance of iterative graph computations. Last but not the least, it establishes a suite of optimization techniques for scalable spatial data processing along with three orthogonal dimensions: (i) scalable processing of spatial alarms for mobile users traveling on road networks, (ii) scalable location tagging for improving the quality of Twitter data analytics and prediction accuracy, and (iii) lightweight spatial indexing for enhancing the performance of big spatial data queries.