Show simple item record

dc.contributor.authorSeshadri, Sangeethaen_US
dc.date.accessioned2009-08-26T17:50:17Z
dc.date.available2009-08-26T17:50:17Z
dc.date.issued2009-05-04en_US
dc.identifier.urihttp://hdl.handle.net/1853/29715
dc.description.abstractEnterprises today are dealing with extremely large amounts of critical digital information that continues to grow at an astonishing rate. On the other hand, storage software (firmware, middleware) and systems are becoming much more complex and existing failure recovery mechanisms are insufficient to handle the scale of these systems while meeting high availability and service quality expectations. In addition, the concurrent development and quality assurance processes, the large number of test scenarios and the large scale of these systems and services imply that failures will be the norm rather than the exception. Therefore achieving high availability and reliability in storage systems remains a major concern and an open research challenge. Most existing work in the domain of storage system availability addresses failures of the storage media (such as disks) and recoverability from these failures. However, failures at the firmware and middleware layers remain largely unaddressed. This dissertation research addresses these challenges in depth across different storage architectures. Concretely, we make the following contributions: First, we develop a recovery conscious framework for multi-core architectures and a suite of techniques for performing efficient fine-grained recovery (micro-recovery) in storage controller firmware that can be retrofitted into legacy code. The framework includes a task-level recovery mechanism, the Log(Lock) architecture that allows system state restoration during micro-recovery, and recovery-conscious scheduling algorithms that are designed to reduce the ripple effect of failure and improve recovery efficiency and system availability. Our second technical contribution addresses the storage middleware availability. We develop the notion of hierarchical middleware architectures by organizing critical cluster management services into a hierarchical overlay network, which separates persistent application state from global system control state and demonstrate significant improvement in the availability and reliability of enterprise scale storage systems. In addition, we develop the notion of operator reuse and a suite of reuse techniques to improve data availability. The key idea of operator reuse is to efficiently utilize system resources by exploiting reuse opportunities in both operators and persistent state of computing nodes. We demonstrate our design through STREAMREUSE, a reuse-conscious store-forward network of storage nodes, which offers distributed stream query processing services.en_US
dc.publisherGeorgia Institute of Technologyen_US
dc.subjectQuery optimizationen_US
dc.subjectSoftware recoveryen_US
dc.subjectMiddlewareen_US
dc.subjectFirmwareen_US
dc.subjectStream processing systemsen_US
dc.subjectStorage systemsen_US
dc.subjectHigh-availability architecturesen_US
dc.subject.lcshData libraries
dc.subject.lcshData warehousing
dc.subject.lcshMiddleware
dc.subject.lcshComputer firmware
dc.subject.lcshData recovery (Computer science)
dc.titleEnhancing availability in large scale storage systems and services: architectures and techniquesen_US
dc.typeDissertationen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentComputingen_US
dc.description.advisorCommittee Chair: Ling Liu; Committee Member: Brian Cooper; Committee Member: Calton Pu; Committee Member: Douglas Blough; Committee Member: Karsten Schwanen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record