Cost-effective and privacy-conscious cloud service provisioning: architectures and algorithms
MetadataShow full item record
Cloud Computing represents a recent paradigm shift that enables users to share and remotely access high-powered computing resources (both infrastructure and software/services) contained in off-site data centers thereby allowing a more efficient use of hardware and software infrastructures. This growing trend in cloud computing, combined with the demands for Big Data and Big Data analytics, is driving the rapid evolution of datacenter technologies towards more cost-effective, consumer-driven, more privacy conscious and technology agnostic solutions. This dissertation is dedicated to taking a systematic approach to develop system-level techniques and algorithms to tackle the challenges of large-scale data processing in the Cloud and scaling and delivering privacy-aware services with anytime-anywhere availability. We analyze the key challenges in effective provisioning of Cloud services in the context of MapReduce-based parallel data processing considering the concerns of cost-effectiveness, performance guarantees and user-privacy and we develop a suite of solution techniques, architectures and models to support cost-optimized and privacy-preserving service provisioning in the Cloud. At the cloud resource provisioning tier, we develop a utility-driven MapReduce Cloud resource planning and management system called Cura for cost-optimally allocating resources to jobs. While existing services require users to select a number of complex cluster and job parameters and use those potentially sub-optimal per-job configurations, the Cura resource management achieves global resource optimization in the cloud by minimizing cost and maximizing resource utilization. We also address the challenges of resource management and job scheduling for large-scale parallel data processing in the Cloud in the presence of networking and storage bottlenecks commonly experienced in Cloud data centers. We develop Purlieus, a self-configurable locality-based data and virtual machine management framework that enables MapReduce jobs to access their data either locally or from close-by nodes including all input, output and intermediate data achieving significant improvements in job response time. We then extend our cloud resource management framework to support privacy-preserving data access and efficient privacy-conscious query processing. Concretely, we propose and implement VNCache: an efficient solution for MapReduce analysis of cloud-archived log data for privacy-conscious enterprises. Through a seamless data streaming and prefetching model in VNCache, Hadoop jobs begin execution as soon as they are launched without requiring any apriori downloading. At the cloud consumer tier, we develop mix-zone based techniques for delivering anonymous cloud services to mobile users on the move through Mobimix, a novel road-network mix-zone based framework that enables real time, location based service delivery without disclosing content or location privacy of the consumers.