Algorithms and optimizations for incremental window-based aggregationsShein, Anatoli U. (2020) Algorithms and optimizations for incremental window-based aggregations. Doctoral Dissertation, University of Pittsburgh. (Unpublished)
AbstractOnline analytics, in most advanced scientific, business, and social media applications, rely heavily on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). ACQs continuously aggregate streaming data and periodically produce results such as max or average over a given window of the latest data. Incremental Evaluation is widely accepted for processing ACQs. It involves storing and reusing results of calculations performed over the unchanged parts of the window, rather than performing the re-evaluation of the entire window after each update. Recently proposed Incremental Evaluation techniques achieve high throughput and low latency in both single- and multi-query environments. In multi-query environments, these techniques share partial aggregates among all of the registered queries (i.e., all the queries are merged and processed as a single execution tree) to achieve maximum sharing. However, it was shown that maximum sharing does not always offer maximum performance, and selective sharing achieves better results by splitting the query load into multiple execution trees. To strike a balance between non-shared and fully shared query executions, the notion of Weavability was proposed, which led to several new Multi-Query optimizers. In this dissertation, we identify that (1) the current Incremental Evaluation techniques fail to exploit the semantics of aggregation operations, leaving considerable room for improvement, and (2) the Weavability-based Multi-Query optimizers target the Incremental Evaluation techniques that are agnostic towards the algebraic properties of the operations, preventing them from achieving improved throughput and latency, and perform Weavability calculations in an inefficient and resource-intensive way, hindering optimizers' scalability with the increasing ACQ load. Motivated by the above observations, in this dissertation we re-examine how the principle of sharing is applied in Incremental Evaluation techniques as well as in the Multi-Query optimizers. Our hypothesis is that sliding-window aggregation processing can benefit from (1) improving the performance of Incremental Evaluation by exploiting the algebraic properties of ACQ's underlying aggregate operations and (2) developing new Multi-Query optimizers that can target multi-node distributed environments and efficiently generate high quality execution plans by exploiting the new Incremental Evaluation techniques. This dissertation research contributes new algorithms for both Incremental Evaluation (FlatFIT and SlickDeque techniques), and Multi-Query optimization (Formula F1, Distributed ACQ Optimizers, and New Cost Estimation Methods). We evaluate all our contributions both theoretically in terms of time and space complexities, and experimentally in terms of throughput, latency, cost minimization, and load balancing using both real and synthetic datasets. Share
Details
MetricsMonthly Views for the past 3 yearsPlum AnalyticsActions (login required)
|