A PROCRUSTEAN APPROACH TO STREAM PROCESSING

KATSIPOULAKIS, NIKOLAOS ROMANOS (2019) A PROCRUSTEAN APPROACH TO STREAM PROCESSING. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Download (2MB) | Preview

Abstract

The increasing demand for real-time data processing and the constantly growing data volume have contributed to the rapid evolution of Stream Processing Engines (SPEs), which are designed to continuously process data as it arrives. Low operational cost and timely delivery of results are both objectives of paramount importance for SPEs. Given the volatile and uncharted nature of data streams, achieving the aforementioned goals under fixed resources is a challenge. This calls for adaptable SPEs, which can react to fluctuations in processing demands.
In the past, three techniques have been developed for improving an SPE’s ability to adapt. Those techniques are classified based on applications’ requirements on exact or approximate results: stream partitioning, and re-partitioning target exact, and load shedding targets approximate processing. Stream partitioning strives to balance load among processors, and previous techniques neglected hidden costs of distributed execution. Load Shedding lowers the accuracy of results by dropping part of the input, and previous techniques did not cope with evolving streams. Stream re-partitioning is used to reconfigure execution while processing takes place, and previous techniques did not fully utilize window semantics.
In this dissertation, we put stream processing in a procrustean bed, in terms of the manner and the degree that processing takes place. To this end, we present new approaches, for window-based aggregate operators, which are applicable to both exact and approximate stream processing in modern SPEs. Our stream partitioning, re-partitioning, and load shedding solutions offer improvements in performance and accuracy on real-world data by exploiting the semantics of both data and operations. In addition, we present SPEAr, the design of an SPE that accelerates processing by delivering approximate results with accuracy guarantees and avoiding unnecessary load. Finally, we contribute a hybrid technique, ShedPart, which can further improve load balance and performance of an SPE.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
KATSIPOULAKIS, NIKOLAOS ROMANOS	nik37@pitt.edu	nik37	0000-0002-7845-5756

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee CoChair	LABRINIDIS, ALEXANDROS	labrinid@cs.pitt.edu	labrinid
Committee CoChair	CHRYSANTHIS, PANOS	panos@cs.pitt.edu	panos
Committee Member	LANGE, JACK	jacklange@cs.pitt.edu	jacklange
Committee Member	PAVLO, ANDREW	pavlo@cs.cmu.edu

Date:

22 May 2019

Date Type:

Publication

Defense Date:

18 December 2018

Approval Date:

22 May 2019

Submission Date:

20 March 2019

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Number of Pages:

195

Institution:

University of Pittsburgh

Schools and Programs:

School of Computing and Information > Computer Science

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Computer Science Databases Data Streams Data Management

Date Deposited:

22 May 2019 12:34

Last Modified:

14 Apr 2020 05:15

URI:

http://d-scholarship.pitt.edu/id/eprint/36391

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

A PROCRUSTEAN APPROACH TO STREAM PROCESSING

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds