Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

High-Dimensional Inference of Modified Poisson-Type Graphical Models and Robust Sparse CCA, with Applications to Large-Scale Omics Data

Zhang, Rong (2020) High-Dimensional Inference of Modified Poisson-Type Graphical Models and Robust Sparse CCA, with Applications to Large-Scale Omics Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img] PDF
Primary Text
Restricted to University of Pittsburgh users only until 16 September 2022.

Download (5MB) | Request a Copy

Abstract

Recent advances in high-throughput sequencing have generated different types of high-dimensional omics data. Even though remarkable progress has been made in statistical inference of high-dimensional Gaussian graphical model (GGM) for gene co-expression network analysis and sparse canonical correlation analysis (CCA) for multi-omics study, efficient computation is always a big concern, and methods beyond Gaussian assumption are even largely unknown. To address both computational and methodological challenges, this dissertation covers efficient implementations of statistical inference of high-dimensional GGM (the first part) and novel statistical methods for count-valued RNA-seq data in gene co-expression network analysis (the second part) and heavy-tailed CITE-seq data in multi-omics study (the third part).
In the first part of the dissertation, we develop an extensive and efficient R package named SILGGM (Statistical Inference of Large-scale Gaussian Graphical Model) that includes four main approaches in statistical inference of high-dimensional GGM. Extensive comparisons illustrate that SILGGM can accelerate existing implementations from several to dozens of orders of magnitudes without loss of accuracy. The package is freely available via CRAN at https://cran.r-project.org/package=SILGGM.
In the second part of the dissertation, we propose a novel two-step procedure in both edge-wise and global statistical inference of three modified Poisson-type graphical models using a cutting-edge generalized low-dimensional projection approach for bias correction. An extensive simulation study illustrates asymptotic normality of edge-wise inference and more accurate inferential results in multiple testing compared to the sole estimation and the inferential method under normal assumption. The application to a novel count-valued RNA-seq data set of childhood atopic asthma in Puerto Ricans demonstrates more biologically meaningful results compared to the sole estimation and the inferential methods based on Gaussian and nonparanormal graphical models.
In the third part of the dissertation, we propose R-CoLaR, a novel Robust Convex Program with group-Lasso Refinement combining the cutting-edge tail-robust covariance estimation for sparse CCA. Numerical studies and the analysis of the heavy-tailed CITE-seq data of a mucosa-associated lymphoid tissue (MALT) tumor have successfully illustrated the validity and noticeable advantages of R-CoLaR over existing methods of sparse CCA in more accurate estimation and better interpretation of protein-RNA correlation.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Zhang, Rongroz16@pitt.eduroz160000-0002-1163-8187
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairRen, Zhaozren@pitt.edu
Committee MemberIyengar, Satishssi@pitt.edu
Committee MemberChen, Kehuikhchen@pitt.edu
Committee MemberChen, Weiwei.chen@chp.edu
Date: 16 September 2020
Date Type: Publication
Defense Date: 30 June 2020
Approval Date: 16 September 2020
Submission Date: 5 July 2020
Access Restriction: 2 year -- Restrict access to University of Pittsburgh for a period of 2 years.
Number of Pages: 175
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Statistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Gene co-expression network; Multi-omics; High-dimensional statistical inference; Efficient package; Modified Poisson graphical model; RNA-seq; Bias correction; Sparse CCA; Heavy-tailed; Tail-robust covariance estimation.
Date Deposited: 16 Sep 2020 15:37
Last Modified: 16 Sep 2020 15:37
URI: http://d-scholarship.pitt.edu/id/eprint/39324

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item