Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

A PROCRUSTEAN APPROACH TO STREAM PROCESSING

KATSIPOULAKIS, NIKOLAOS ROMANOS (2019) A PROCRUSTEAN APPROACH TO STREAM PROCESSING. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (2MB) | Preview

Abstract

The increasing demand for real-time data processing and the constantly growing data volume have contributed to the rapid evolution of Stream Processing Engines (SPEs), which are designed to continuously process data as it arrives. Low operational cost and timely delivery of results are both objectives of paramount importance for SPEs. Given the volatile and uncharted nature of data streams, achieving the aforementioned goals under fixed resources is a challenge. This calls for adaptable SPEs, which can react to fluctuations in processing demands.
In the past, three techniques have been developed for improving an SPE’s ability to adapt. Those techniques are classified based on applications’ requirements on exact or approximate results: stream partitioning, and re-partitioning target exact, and load shedding targets approximate processing. Stream partitioning strives to balance load among processors, and previous techniques neglected hidden costs of distributed execution. Load Shedding lowers the accuracy of results by dropping part of the input, and previous techniques did not cope with evolving streams. Stream re-partitioning is used to reconfigure execution while processing takes place, and previous techniques did not fully utilize window semantics.
In this dissertation, we put stream processing in a procrustean bed, in terms of the manner and the degree that processing takes place. To this end, we present new approaches, for window-based aggregate operators, which are applicable to both exact and approximate stream processing in modern SPEs. Our stream partitioning, re-partitioning, and load shedding solutions offer improvements in performance and accuracy on real-world data by exploiting the semantics of both data and operations. In addition, we present SPEAr, the design of an SPE that accelerates processing by delivering approximate results with accuracy guarantees and avoiding unnecessary load. Finally, we contribute a hybrid technique, ShedPart, which can further improve load balance and performance of an SPE.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
KATSIPOULAKIS, NIKOLAOS ROMANOSnik37@pitt.edunik370000-0002-7845-5756
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee CoChairLABRINIDIS, ALEXANDROSlabrinid@cs.pitt.edulabrinid
Committee CoChairCHRYSANTHIS, PANOSpanos@cs.pitt.edupanos
Committee MemberLANGE, JACKjacklange@cs.pitt.edujacklange
Committee MemberPAVLO, ANDREWpavlo@cs.cmu.edu
Date: 22 May 2019
Date Type: Publication
Defense Date: 18 December 2018
Approval Date: 22 May 2019
Submission Date: 20 March 2019
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 195
Institution: University of Pittsburgh
Schools and Programs: School of Computing and Information > Computer Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Computer Science Databases Data Streams Data Management
Date Deposited: 22 May 2019 12:34
Last Modified: 14 Apr 2020 05:15
URI: http://d-scholarship.pitt.edu/id/eprint/36391

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item