KATSIPOULAKIS, NIKOLAOS ROMANOS
(2019)
A PROCRUSTEAN APPROACH TO STREAM PROCESSING.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
The increasing demand for real-time data processing and the constantly growing data volume have contributed to the rapid evolution of Stream Processing Engines (SPEs), which are designed to continuously process data as it arrives. Low operational cost and timely delivery of results are both objectives of paramount importance for SPEs. Given the volatile and uncharted nature of data streams, achieving the aforementioned goals under fixed resources is a challenge. This calls for adaptable SPEs, which can react to fluctuations in processing demands.
In the past, three techniques have been developed for improving an SPE’s ability to adapt. Those techniques are classified based on applications’ requirements on exact or approximate results: stream partitioning, and re-partitioning target exact, and load shedding targets approximate processing. Stream partitioning strives to balance load among processors, and previous techniques neglected hidden costs of distributed execution. Load Shedding lowers the accuracy of results by dropping part of the input, and previous techniques did not cope with evolving streams. Stream re-partitioning is used to reconfigure execution while processing takes place, and previous techniques did not fully utilize window semantics.
In this dissertation, we put stream processing in a procrustean bed, in terms of the manner and the degree that processing takes place. To this end, we present new approaches, for window-based aggregate operators, which are applicable to both exact and approximate stream processing in modern SPEs. Our stream partitioning, re-partitioning, and load shedding solutions offer improvements in performance and accuracy on real-world data by exploiting the semantics of both data and operations. In addition, we present SPEAr, the design of an SPE that accelerates processing by delivering approximate results with accuracy guarantees and avoiding unnecessary load. Finally, we contribute a hybrid technique, ShedPart, which can further improve load balance and performance of an SPE.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
22 May 2019 |
Date Type: |
Publication |
Defense Date: |
18 December 2018 |
Approval Date: |
22 May 2019 |
Submission Date: |
20 March 2019 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
195 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Computing and Information > Computer Science |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Computer Science
Databases
Data Streams
Data Management |
Date Deposited: |
22 May 2019 12:34 |
Last Modified: |
14 Apr 2020 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/36391 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |