GRUBJOIN: An Adaptive Multi-Way Windowed Stream Join with Time Correlation-Aware CPU Load Shedding
Yu, Philip S.
MetadataShow full item record
Dropping tuples has been commonly used for load shedding. However, tuple dropping generally is inadequate to shed load for multiway windowed stream joins. The output rate can be unnecessarily and severely degraded because tuple dropping does not recognize time correlations likely to exist among the streams. This paper introduces GrubJoin: an adaptive multi-way windowed stream join that efficiently performs time correlation-aware CPU load shedding. GrubJoin maximizes the output rate by achieving nearoptimal window harvesting within an operator throttling framework, i.e., regulating the fractions of the join windows that are processed by the multi-way join. Window harvesting performs the join using only certain more useful segments of the join windows. Due mainly to the combinatorial explosion of possible multi-way join sequences involving various segments of individual join windows, GrubJoin faces a set of unique challenges, such as determining the optimal window harvesting configuration and learning the time correlations among the streams. To tackle these challenges, we formalize window harvesting as an optimization problem, develop greedy heuristics to determine near-optimal window harvesting configurations and use approximation techniques to capture the time correlations among the streams. Experimental results show that GrubJoin is vastly superior to tuple dropping when time correlations exist among the streams and is equally effective as tuple dropping in the absence of time correlations.
- CERCS Technical Reports