Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Efficient Process Data Warehousing

Hsu, Ying-Feng (2016) Efficient Process Data Warehousing. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (6MB)


This dissertation presents a data processing architecture for efficient data warehousing from historical data sources. The present work has three primary contributions. The first contribution is the development of a generalized process data warehousing (PDW) architecture that includes multilayer data processing steps to transform raw data streams into useful information that facilitates data-driven decision making. The second contribution is exploring the applicability of the proposed architecture to the case of sparse process data. We have tested the proposed approach in a medical monitoring system, which takes physiological data and predicts the clinical setting in which the data is most likely to be seen. We have performed a set of experiments with real clinical data (from Children’s Hospital of Pittsburgh) that demonstrate the high utility of the present approach. The third contribution is exploring the applicability of the proposed PDW architecture to the case of redundant process data. We have designed and developed a conflict-aware data fusion strategy for the efficient aggregation of historical data. We have elaborated a simulation-based study of the tradeoffs between the data fusion solutions and data accuracy, and have also evaluated the solutions to a large-scale integrated framework (Tycho data) that includes historical data from heterogeneous sources in different subject areas. Finally, we propose and have evaluated a state sequence recovery (SSR) framework, which integrates work from two previous studies, which are both sparse and redundant studies. Our experimental results are based on several algorithms that have been developed and tested in different simulation set-up scenarios under both normal and exponential data distributions.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Hsu, Ying-Fengyih13@pitt.eduYIH13
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairZadorozhny, Vladimirvladimir@sis.pitt.eduVIZ
Committee MemberDruzdzel, Marek J.marek@sis.pitt.eduDRUZDZEL
Committee MemberKarimi, Hassan A.hkarimi@pitt.eduHKARIMI
Committee MemberSpring, Michael Bspring@pitt.eduSPRING
Date: 13 January 2016
Date Type: Publication
Defense Date: 13 May 2015
Approval Date: 13 January 2016
Submission Date: 4 December 2015
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 220
Institution: University of Pittsburgh
Schools and Programs: School of Information Sciences > Information Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Data Warehouse, Data Fusion, Time-series Data Analyzing, Pattern Recognition, Machine Learning
Date Deposited: 13 Jan 2016 16:18
Last Modified: 15 Nov 2016 14:31


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item