Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Statistical Modeling for High-Dimensional Omics Studies for Congruence, Heterogeneity and Clustering

Zong, Wei (2023) Statistical Modeling for High-Dimensional Omics Studies for Congruence, Heterogeneity and Clustering. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img] PDF
Restricted to University of Pittsburgh users only until 9 May 2025.

Download (14MB) | Request a Copy


High-dimensional omics data generated from high-throughput technologies capture molecular intricacy and variations, providing insights into the pathological development of human diseases. However, statistical quantification of heterogeneity and congruence can be difficult both within a cohort and across studies due to the high dimensionality. This dissertation focuses on methodology development for cross-species congruence analysis for transcriptomic responses (Chapter 2), multivariate guided clustering for disease subtyping (Chapter 3) and multi-facet clustering in omics data (Chapter 4).
Chapter 2 provides a congruence analysis framework for transcriptomic response analysis by developing quantitative concordance/discordance scores incorporating data variabilities and pathway-centric investigation. This framework can be applied to cross-species/tissues studies to assist researchers to quantify and visually identify molecular mechanisms and pathway subnetworks that are mimicked by model organisms, providing foundations for hypothesis generation and subsequent translational decisions.
Chapter 3 proposes a multivariate guided clustering model (mgClust) to identify molecular subtypes of a complex disease that are associated to multiple clinical variables collectively. Compared with existing methods, we show that mgClust has improved clustering and feature selection performance with accurate clinical variable selection through extensive simulations. Application to a lung disease dataset shows its benefit in enhancing mechanistic interpretation.
Chapter 4 proposes a model-based multi-facet clustering algorithm to simultaneously discover multiple meaningful partitions of samples. Facets with heterogeneous partitions are achieved by the competition of adjusted likelihoods in mixture models while clusters within each facet are determined through the competition across individual Gaussian distributions. Application to multiple human brain tissue datasets show its effectiveness in capturing multiple distinct perspectives nested in high-dimensional omics data.
Contribution to public health: The framework proposed in Chapter 2 provides a quantitative approach to identify biomarkers and topological gene regulatory modules that are best mimicked by the model organism, which will facilitate translational guidance of animal models. The model proposed in Chapter 3 can identify disease subtypes that are associated with clinical variables of interests, which has important implication toward precision medicine. Chapter 4 provides a tool for simultaneously generating multiple partitions of samples reflecting different perspectives of the dataset, facilitating the discovery of new knowledge in diseases.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Zong, Weiwez97@pitt.eduwez97
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorTseng, George
Committee MemberTang,
Committee MemberPark, Hyun Junghyp15@pitt.eduhyp15
Committee MemberYu, Guanguy24@pitt.eduguy24
Date: 9 May 2023
Date Type: Publication
Defense Date: 4 April 2023
Approval Date: 9 May 2023
Submission Date: 14 April 2023
Access Restriction: 2 year -- Restrict access to University of Pittsburgh for a period of 2 years.
Number of Pages: 155
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: high-dimensional omics data, congruence, heterogeneity, clustering
Date Deposited: 10 May 2023 02:30
Last Modified: 10 May 2023 02:30


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item