dQUOB: Managing Large Data Flows by Dynamic Embedded Queries
Abstract
The dQUOB System is a compiler and run-time environment used to embed
computational entities called Quoblets into high-volume data streams. The
data streams we speak of are the flows of data that exist in large-scale
visualizations, video streaming to a large number of distributed users, and
high volume business transactions. The dQUOB system lets a person specify
application-specific queries to control the data flow, that is, queries that
examine the data flow and make decisions prior to computations being
performed. Through coupling queries and computations, the decision-making
of a computational entity is enhanced and more broadly, the scalability of
the entire data flow increased.
The first goal of the paper is to provide an overview of the dQUOB system
focussed on the features that make it useful for data streaming in
grid-based computing environments. Our second goal is to establish the
strength of the dQUOB system through measurement. By benchmarking an
embedded query/computation object, we can determine its overhead cost. By
using application specific data and computations, we explore the cases where
embedded computation and dynamic changes to the computation make sense from
a cost tradeoff point of view. Finally, we demonstrate the ability of
queries to reduce end-to-end latency to show that the query itself must be
written with care.