Zhang, Rong
(2020)
High-Dimensional Inference of Modified Poisson-Type Graphical Models and Robust Sparse CCA, with Applications to Large-Scale Omics Data.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Recent advances in high-throughput sequencing have generated different types of high-dimensional omics data. Even though remarkable progress has been made in statistical inference of high-dimensional Gaussian graphical model (GGM) for gene co-expression network analysis and sparse canonical correlation analysis (CCA) for multi-omics study, efficient computation is always a big concern, and methods beyond Gaussian assumption are even largely unknown. To address both computational and methodological challenges, this dissertation covers efficient implementations of statistical inference of high-dimensional GGM (the first part) and novel statistical methods for count-valued RNA-seq data in gene co-expression network analysis (the second part) and heavy-tailed CITE-seq data in multi-omics study (the third part).
In the first part of the dissertation, we develop an extensive and efficient R package named SILGGM (Statistical Inference of Large-scale Gaussian Graphical Model) that includes four main approaches in statistical inference of high-dimensional GGM. Extensive comparisons illustrate that SILGGM can accelerate existing implementations from several to dozens of orders of magnitudes without loss of accuracy. The package is freely available via CRAN at https://cran.r-project.org/package=SILGGM.
In the second part of the dissertation, we propose a novel two-step procedure in both edge-wise and global statistical inference of three modified Poisson-type graphical models using a cutting-edge generalized low-dimensional projection approach for bias correction. An extensive simulation study illustrates asymptotic normality of edge-wise inference and more accurate inferential results in multiple testing compared to the sole estimation and the inferential method under normal assumption. The application to a novel count-valued RNA-seq data set of childhood atopic asthma in Puerto Ricans demonstrates more biologically meaningful results compared to the sole estimation and the inferential methods based on Gaussian and nonparanormal graphical models.
In the third part of the dissertation, we propose R-CoLaR, a novel Robust Convex Program with group-Lasso Refinement combining the cutting-edge tail-robust covariance estimation for sparse CCA. Numerical studies and the analysis of the heavy-tailed CITE-seq data of a mucosa-associated lymphoid tissue (MALT) tumor have successfully illustrated the validity and noticeable advantages of R-CoLaR over existing methods of sparse CCA in more accurate estimation and better interpretation of protein-RNA correlation.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
16 September 2020 |
Date Type: |
Publication |
Defense Date: |
30 June 2020 |
Approval Date: |
16 September 2020 |
Submission Date: |
5 July 2020 |
Access Restriction: |
2 year -- Restrict access to University of Pittsburgh for a period of 2 years. |
Number of Pages: |
175 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Statistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Gene co-expression network; Multi-omics; High-dimensional statistical inference; Efficient package; Modified Poisson graphical model; RNA-seq; Bias correction; Sparse CCA; Heavy-tailed; Tail-robust covariance estimation. |
Date Deposited: |
16 Sep 2020 15:37 |
Last Modified: |
16 Sep 2022 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/39324 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |