Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Towards Scalable, Cloud Based, Confidential Data Stream Processing

Thoma, Cory (2019) Towards Scalable, Cloud Based, Confidential Data Stream Processing. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Download (2MB) | Preview


Increasing data availability, velocity, variability, and size have lead to the development of new data processing paradigms that offer users different ways to process and manage data specific to their needs. One such paradigm is data stream processing, as managed by Data Stream Processing Systems (DSPS). In contrast to traditional database management systems wherein data is stationary and queries are transient, in stream processing systems, data is transient and queries are stationary (that is, continuous and long running). In such systems, users are expecting to process temporal data, where data is only considered for some period of time, and discarded after. Often, as with many other software applications, those who employ such systems will outsource computation to third party computation platforms such as Amazon, IBM, or Google. The use of third parties not only outsources computation, but it outsources hardware and software maintenance costs as well, relieving the user from having to incur these costs themselves. Moreover, when a user outsources their DSPS, they often have some service level agreement that places guarantees on service availability and uptime.

Given the above benefits to outsourcing computation, it is clearly desirable for a user to outsource their DSPS computation. Such outsourcing, however, may violate the privacy constraints of the those who provide the data stream. Specifically, they may not wish to share their plaintext data with a third-party that they may not trust. This leads to an interesting dichotomy between the desire of the user to outsource as much of their computation as possible and the desire of the data stream providers to keep their data private and avoid leaking data to a third-party system. Current work that explores linking the two poles of this dichotomy either limits the expressiveness of supported queries, requires the data provider to trust the third-party systems, or incurs computational or monetary overheads prohibitive for the querier.

In this dissertation, we explore the methods for shrinking the gap between the poles of this dichotomy and overcome the limitation of the state-of-the art systems by providing data providers and queriers with efficient access control enforcement on untrusted third party systems over encrypted data. Specifically, we introduce our system PolyStream for executing queries on encrypted data using computation-enabling encryption, with an online key management system. We further introduce Sanctuary to provide computation on any data on third-party systems using trusted hardware. Finally we introduce Shoal, our query optimizer that considers the heterogeneous nature of streaming systems at optimization time to improve query performance when access controls are enforced on the streaming data. Through the union of the contributions of this dissertation, we show that considering access controls at optimization time can lead to better utilization, performance, and protection for streaming data.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Thoma, Corycmt69@pitt.educmt690000-0002-9721-5117
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee CoChairLee, Adamadamlee@cs.pitt.eduadamlee0000-0002-2596-7256
Committee CoChairLabrinidis, Alexandroslabrinid@cs.pitt.edulabrinid0000-0003-1349-0056
Committee MemberChrysanthis, Panospanos@cs.pitt.edupanos0000-0001-7189-9816
Date: 30 August 2019
Date Type: Publication
Defense Date: 16 May 2019
Approval Date: 30 August 2019
Submission Date: 30 July 2019
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 143
Institution: University of Pittsburgh
Schools and Programs: School of Computing and Information > Computer Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Access Controls, Data Streaming, Privacy, Confidentiality
Date Deposited: 30 Aug 2019 15:41
Last Modified: 30 Aug 2019 15:41

Available Versions of this Item


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item