Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Algorithms and optimizations for incremental window-based aggregations

Shein, Anatoli U. (2020) Algorithms and optimizations for incremental window-based aggregations. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (7MB) | Preview

Abstract

Online analytics, in most advanced scientific, business, and social media applications, rely heavily on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). ACQs continuously aggregate streaming data and periodically produce results such as max or average over a given window of the latest data.

Incremental Evaluation is widely accepted for processing ACQs. It involves storing and reusing results of calculations performed over the unchanged parts of the window, rather than performing the re-evaluation of the entire window after each update. Recently proposed Incremental Evaluation techniques achieve high throughput and low latency in both single- and multi-query environments. In multi-query environments, these techniques share partial aggregates among all of the registered queries (i.e., all the queries are merged and processed as a single execution tree) to achieve maximum sharing. However, it was shown that maximum sharing does not always offer maximum performance, and selective sharing achieves better results by splitting the query load into multiple execution trees. To strike a balance between non-shared and fully shared query executions, the notion of Weavability was proposed, which led to several new Multi-Query optimizers.

In this dissertation, we identify that (1) the current Incremental Evaluation techniques fail to exploit the semantics of aggregation operations, leaving considerable room for improvement, and (2) the Weavability-based Multi-Query optimizers target the Incremental Evaluation techniques that are agnostic towards the algebraic properties of the operations, preventing them from achieving improved throughput and latency, and perform Weavability calculations in an inefficient and resource-intensive way, hindering optimizers' scalability with the increasing ACQ load.

Motivated by the above observations, in this dissertation we re-examine how the principle of sharing is applied in Incremental Evaluation techniques as well as in the Multi-Query optimizers. Our hypothesis is that sliding-window aggregation processing can benefit from (1) improving the performance of Incremental Evaluation by exploiting the algebraic properties of ACQ's underlying aggregate operations and (2) developing new Multi-Query optimizers that can target multi-node distributed environments and efficiently generate high quality execution plans by exploiting the new Incremental Evaluation techniques.

This dissertation research contributes new algorithms for both Incremental Evaluation (FlatFIT and SlickDeque techniques), and Multi-Query optimization (Formula F1, Distributed ACQ Optimizers, and New Cost Estimation Methods). We evaluate all our contributions both theoretically in terms of time and space complexities, and experimentally in terms of throughput, latency, cost minimization, and load balancing using both real and synthetic datasets.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Shein, Anatoli U.anatoli.shein@gmail.comaus40000-0002-3270-2758
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairChrysanthis, Panos Kpanos@cs.pitt.edu
Committee MemberLabrinidis, Alexandroslabrinid@cs.pitt.edu
Committee MemberLee, Adam Jadamlee@pitt.edu
Committee MemberZadorozhny, Vladimir Iviz@pitt.edu
Date: 23 January 2020
Date Type: Publication
Defense Date: 3 September 2019
Approval Date: 23 January 2020
Submission Date: 3 December 2019
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 150
Institution: University of Pittsburgh
Schools and Programs: School of Computing and Information > Computer Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Data Stream, Sliding Window, Aggregate Query, Incremental Evaluation, SlickDeque, FlatFIT, F1 Multi-Query Optimization
Date Deposited: 23 Jan 2020 21:27
Last Modified: 23 Jan 2021 06:15
URI: http://d-scholarship.pitt.edu/id/eprint/37935

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item