The Ginga Approach to Adaptive Query Processing in Large Distributed Systems
Paques, Henrique Wiermann
MetadataShow full item record
Processing and optimizing ad-hoc and continual queries in an open environment with distributed, autonomous, and heterogeneous data servers (e.g., the Internet) pose several technical challenges. First, it is well known that optimized query execution plans constructed at compile time make some assumptions about the environment (e.g., network speed, data sources' availability). When such assumptions no longer hold at runtime, how can I guarantee the optimized execution of the query? Second, it is widely recognized that runtime adaptation is a complex and difficult task in terms of cost and benefit. How to develop an adaptation methodology that makes the runtime adaptation beneficial at an affordable cost? Last, but not the least, are there any viable performance metrics and performance evaluation techniques for measuring the cost and validating the benefits of runtime adaptation methods? To address the new challenges posed by Internet query and search systems, several areas of computer science (e.g., database and operating systems) are exploring the design of systems that are adaptive to their environment. However, despite the large number of adaptive systems proposed in the literature up to now, most of them present a solution for adapting the system to a specific change to the runtime environment. Typically, these solutions are not easily ``extendable' to allow the system to adapt to other runtime changes not predicted in their approach. In this dissertation, I study the problem of how to construct a framework where I can catalog the known solutions to query processing adaptation and how to develop an application that makes use of this framework. I call the solution to these two problems the Ginga approach. I provide in this dissertation three main contributions: The first contribution is the adoption of the Adaptation Space concept combined with feedback-based control mechanisms for coordinating and integrating different kinds of query adaptations to different runtime changes. The second contribution is the development of a systematic approach, called Ginga, to integrate the adaptation space with feedback control that allows me to combine the generation of predefined query plans (at compile-time) with reactive adaptive query processing (at runtime), including policies and mechanisms for determining when to adapt, what to adapt, and how to adapt. The third contribution is a detailed study on how to adapt to two important runtime changes, and their combination, encountered during the execution of distributed queries: memory constraints and end-to-end delays.