Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Integrative Analysis of Variation Structure in High-dimensional Multi-block Data

Lee, Sungwon (2017) Integrative Analysis of Variation Structure in High-dimensional Multi-block Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (3MB) | Preview


The multi-block data stand for the data situation where multiple data sets possibly from different platforms are measured on common subjects. This data type is ubiquitous in modern sciences. Moreover, data become increasingly high-dimensional. For example, in genetic studies, it is common to evaluate gene expression, microRNA and DNA methylation levels on a single tissue sample and, thanks to the advancing microarray technology, scientists examine thousands of genes in a single experiment. Separate analyses of individual data sets will not capture critical association relations among them that could encode valuable information for better understanding of the target subjects. Currently, there is a strong need for new statistical methods of analyzing high-dimensional multi-block data in an integrative and unified way.

This dissertation consists of three parts whose shared theme is to identify meaningful variations in multi-block data that account for the complex associations between component data sets. The found variations are then utilized for various statistical purposes: characterizing data in a precise and interpretable way; estimating weights in calculating scores of data that give maximal correlation; identifying the dynamics of how ancillary data affect variations over multi-block data; serving as an effective dimension reduction for classification. In the first part, we propose a non-linear extension of functional principal component analysis to effectively catch major variabilities in functional data exhibiting both amplitude and phase variations by taking into account the associations between those two variations. The second topic is an asymptotic study of the canonical correlation analysis where dimension grows and sample size remains fixed. In the third part, we devise a supervised multi-block data factorization scheme that decomposes the primary data sets with guidance from auxiliary data sets. Estimated layers of the resulting decomposition provide detailed information on variation structures and supervision effects. The advantages of an integrative analysis of multi-block data will be demonstrated by simulation studies and real data applications such as pediatric growth curve, lip motion and gene expression-microRNA data analyses.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Lee, Sungwonsul23@pitt.eduSUL23
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairJung, Sungkyusungkyu@pitt.eduSUNGKYU
Committee MemberIyengar, Satishssi@pitt.eduSSI
Committee MemberChen, Kehuikhchen@pitt.eduKHCHEN
Committee MemberTseng, Georgectseng@pitt.eduCTSENG
Date: 26 January 2017
Date Type: Publication
Defense Date: 19 October 2016
Approval Date: 26 January 2017
Submission Date: 1 November 2016
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 147
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Statistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: integrative analysis, variation structure, high-dimensional data, multi-block data
Date Deposited: 26 Jan 2017 16:01
Last Modified: 26 Jan 2018 06:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item