Lee, Sungwon
(2017)
Integrative Analysis of Variation Structure in High-dimensional Multi-block Data.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
The multi-block data stand for the data situation where multiple data sets possibly from different platforms are measured on common subjects. This data type is ubiquitous in modern sciences. Moreover, data become increasingly high-dimensional. For example, in genetic studies, it is common to evaluate gene expression, microRNA and DNA methylation levels on a single tissue sample and, thanks to the advancing microarray technology, scientists examine thousands of genes in a single experiment. Separate analyses of individual data sets will not capture critical association relations among them that could encode valuable information for better understanding of the target subjects. Currently, there is a strong need for new statistical methods of analyzing high-dimensional multi-block data in an integrative and unified way.
This dissertation consists of three parts whose shared theme is to identify meaningful variations in multi-block data that account for the complex associations between component data sets. The found variations are then utilized for various statistical purposes: characterizing data in a precise and interpretable way; estimating weights in calculating scores of data that give maximal correlation; identifying the dynamics of how ancillary data affect variations over multi-block data; serving as an effective dimension reduction for classification. In the first part, we propose a non-linear extension of functional principal component analysis to effectively catch major variabilities in functional data exhibiting both amplitude and phase variations by taking into account the associations between those two variations. The second topic is an asymptotic study of the canonical correlation analysis where dimension grows and sample size remains fixed. In the third part, we devise a supervised multi-block data factorization scheme that decomposes the primary data sets with guidance from auxiliary data sets. Estimated layers of the resulting decomposition provide detailed information on variation structures and supervision effects. The advantages of an integrative analysis of multi-block data will be demonstrated by simulation studies and real data applications such as pediatric growth curve, lip motion and gene expression-microRNA data analyses.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
26 January 2017 |
Date Type: |
Publication |
Defense Date: |
19 October 2016 |
Approval Date: |
26 January 2017 |
Submission Date: |
1 November 2016 |
Access Restriction: |
1 year -- Restrict access to University of Pittsburgh for a period of 1 year. |
Number of Pages: |
147 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Statistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
integrative analysis, variation structure, high-dimensional data, multi-block data |
Date Deposited: |
26 Jan 2017 16:01 |
Last Modified: |
26 Jan 2018 06:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/30088 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |