Show simple item record

dc.contributor.advisorLiu, Ling
dc.contributor.authorTang, Yuzhe
dc.date.accessioned2015-09-21T15:52:12Z
dc.date.available2015-09-22T05:30:06Z
dc.date.created2014-08
dc.date.issued2014-05-16
dc.date.submittedAugust 2014
dc.identifier.urihttp://hdl.handle.net/1853/53995
dc.description.abstractCloud computing and big data technology continue to revolutionize how computing and data analysis are delivered today and in the future. To store and process the fast-changing big data, various scalable systems (e.g. key-value stores and MapReduce) have recently emerged in industry. However, there is a huge gap between what these open-source software systems can offer and what the real-world applications demand. First, scalable key-value stores are designed for simple data access methods, which limit their use in advanced database applications. Second, existing systems in the cloud need automatic performance optimization for better resource management with minimized operational overhead. Third, the demand continues to grow for privacy-preserving search and information sharing between autonomous data providers, as exemplified by the Healthcare information networks. My Ph.D. research aims at bridging these gaps. First, I proposed HINDEX, for secondary index support on top of write-optimized key-value stores (e.g. HBase and Cassandra). To update the index structure efficiently in the face of an intensive write stream, HINDEX synchronously executes append-only operations and defers the so-called index-repair operations which are expensive. The core contribution of HINDEX is a scheduling framework for deferred and lightweight execution of index repairs. HINDEX has been implemented and is currently being transferred to an IBM big data product. Second, I proposed Auto-pipelining for automatic performance optimization of streaming applications on multi-core machines. The goal is to prevent the bottleneck scenario in which the streaming system is blocked by a single core while all other cores are idling, which wastes resources. To partition the streaming workload evenly to all the cores and to search for the best partitioning among many possibilities, I proposed a heuristic based search strategy that achieves locally optimal partitioning with lightweight search overhead. The key idea is to use a white-box approach to search for the theoretically best partitioning and then use a black-box approach to verify the effectiveness of such partitioning. The proposed technique, called Auto-pipelining, is implemented on IBM Stream S. Third, I proposed ǫ-PPI, a suite of privacy preserving index algorithms that allow data sharing among unknown parties and yet maintaining a desired level of data privacy. To differentiate privacy concerns of different persons, I proposed a personalized privacy definition and substantiated this new privacy requirement by the injection of false positives in the published ǫ-PPI data. To construct the ǫ-PPI securely and efficiently, I proposed to optimize the performance of multi-party computations which are otherwise expensive; the key idea is to use addition-homomorphic secret sharing mechanism which is inexpensive and to do the distributed computation in a scalable P2P overlay.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.publisherGeorgia Institute of Technology
dc.subjectCloud
dc.subjectBig-data
dc.subjectSecurity
dc.subjectEfficiency
dc.subjectPerformance
dc.subjectStreaming
dc.subjectMulti-core
dc.subjectIndex
dc.subjectKey-value stores
dc.subjectPrivacy preserving
dc.subjectPerformance optimization
dc.subjectLog-structured systems
dc.titleSecure and high-performance big-data systems in the cloud
dc.typeDissertation
dc.description.degreePh.D.
dc.contributor.departmentComputer Science
dc.embargo.terms2015-08-01
thesis.degree.levelDoctoral
dc.contributor.committeeMemberAhamad, Mustaque
dc.contributor.committeeMemberBlough, Doug
dc.contributor.committeeMemberOmiecinski, Edward
dc.contributor.committeeMemberPu, Calton
dc.date.updated2015-09-21T15:52:12Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record