Zong, Wei
(2023)
Statistical Modeling for High-Dimensional Omics Studies for Congruence, Heterogeneity and Clustering.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
High-dimensional omics data generated from high-throughput technologies capture molecular intricacy and variations, providing insights into the pathological development of human diseases. However, statistical quantification of heterogeneity and congruence can be difficult both within a cohort and across studies due to the high dimensionality. This dissertation focuses on methodology development for cross-species congruence analysis for transcriptomic responses (Chapter 2), multivariate guided clustering for disease subtyping (Chapter 3) and multi-facet clustering in omics data (Chapter 4).
Chapter 2 provides a congruence analysis framework for transcriptomic response analysis by developing quantitative concordance/discordance scores incorporating data variabilities and pathway-centric investigation. This framework can be applied to cross-species/tissues studies to assist researchers to quantify and visually identify molecular mechanisms and pathway subnetworks that are mimicked by model organisms, providing foundations for hypothesis generation and subsequent translational decisions.
Chapter 3 proposes a multivariate guided clustering model (mgClust) to identify molecular subtypes of a complex disease that are associated to multiple clinical variables collectively. Compared with existing methods, we show that mgClust has improved clustering and feature selection performance with accurate clinical variable selection through extensive simulations. Application to a lung disease dataset shows its benefit in enhancing mechanistic interpretation.
Chapter 4 proposes a model-based multi-facet clustering algorithm to simultaneously discover multiple meaningful partitions of samples. Facets with heterogeneous partitions are achieved by the competition of adjusted likelihoods in mixture models while clusters within each facet are determined through the competition across individual Gaussian distributions. Application to multiple human brain tissue datasets show its effectiveness in capturing multiple distinct perspectives nested in high-dimensional omics data.
Contribution to public health: The framework proposed in Chapter 2 provides a quantitative approach to identify biomarkers and topological gene regulatory modules that are best mimicked by the model organism, which will facilitate translational guidance of animal models. The model proposed in Chapter 3 can identify disease subtypes that are associated with clinical variables of interests, which has important implication toward precision medicine. Chapter 4 provides a tool for simultaneously generating multiple partitions of samples reflecting different perspectives of the dataset, facilitating the discovery of new knowledge in diseases.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
9 May 2023 |
Date Type: |
Publication |
Defense Date: |
4 April 2023 |
Approval Date: |
9 May 2023 |
Submission Date: |
14 April 2023 |
Access Restriction: |
2 year -- Restrict access to University of Pittsburgh for a period of 2 years. |
Number of Pages: |
155 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
high-dimensional omics data, congruence, heterogeneity, clustering |
Date Deposited: |
10 May 2023 02:30 |
Last Modified: |
10 May 2023 02:30 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/44552 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |