Enhancing availability in large scale storage systems and services: architectures and techniques

Show full item record

Please use this identifier to cite or link to this item: http://hdl.handle.net/1853/29715

Title: Enhancing availability in large scale storage systems and services: architectures and techniques
Author: Seshadri, Sangeetha
Abstract: Enterprises today are dealing with extremely large amounts of critical digital information that continues to grow at an astonishing rate. On the other hand, storage software (firmware, middleware) and systems are becoming much more complex and existing failure recovery mechanisms are insufficient to handle the scale of these systems while meeting high availability and service quality expectations. In addition, the concurrent development and quality assurance processes, the large number of test scenarios and the large scale of these systems and services imply that failures will be the norm rather than the exception. Therefore achieving high availability and reliability in storage systems remains a major concern and an open research challenge. Most existing work in the domain of storage system availability addresses failures of the storage media (such as disks) and recoverability from these failures. However, failures at the firmware and middleware layers remain largely unaddressed. This dissertation research addresses these challenges in depth across different storage architectures. Concretely, we make the following contributions: First, we develop a recovery conscious framework for multi-core architectures and a suite of techniques for performing efficient fine-grained recovery (micro-recovery) in storage controller firmware that can be retrofitted into legacy code. The framework includes a task-level recovery mechanism, the Log(Lock) architecture that allows system state restoration during micro-recovery, and recovery-conscious scheduling algorithms that are designed to reduce the ripple effect of failure and improve recovery efficiency and system availability. Our second technical contribution addresses the storage middleware availability. We develop the notion of hierarchical middleware architectures by organizing critical cluster management services into a hierarchical overlay network, which separates persistent application state from global system control state and demonstrate significant improvement in the availability and reliability of enterprise scale storage systems. In addition, we develop the notion of operator reuse and a suite of reuse techniques to improve data availability. The key idea of operator reuse is to efficiently utilize system resources by exploiting reuse opportunities in both operators and persistent state of computing nodes. We demonstrate our design through STREAMREUSE, a reuse-conscious store-forward network of storage nodes, which offers distributed stream query processing services.
Type: Dissertation
URI: http://hdl.handle.net/1853/29715
Date: 2009-05-04
Publisher: Georgia Institute of Technology
Subject: Query optimization
Software recovery
Stream processing systems
Storage systems
High-availability architectures
Data libraries
Data warehousing
Computer firmware
Data recovery (Computer science)
Department: Computing
Advisor: Committee Chair: Ling Liu; Committee Member: Brian Cooper; Committee Member: Calton Pu; Committee Member: Douglas Blough; Committee Member: Karsten Schwan
Degree: Ph.D.

All materials in SMARTech are protected under U.S. Copyright Law and all rights are reserved, unless otherwise specifically indicated on or in the materials.

Files in this item

Files Size Format View
seshadri_sangeetha_200908_phd.pdf 7.590Mb PDF View/ Open

This item appears in the following Collection(s)

Show full item record