Clustering, Biomarker and Cancer Model Selection Using Omics DataZou, Jian (2023) Clustering, Biomarker and Cancer Model Selection Using Omics Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished) This is the latest version of this item.
AbstractCentral dogma reforms biomedical science. Since then, biomedical researchers have focused mostly on the relationship between DNA, RNA, and protein. To quantify their sequence, structure, and abundance, numerous biotechnologies have been created. High-throughput technologies, which emerged since 2000s, offer researchers a fantastic opportunity to thoroughly grasp the mechanism of diseases and also bring many statistical challenges. This proposal focuses on constrained clustering (Chapter 2), multi-study multi-class concordant biomarker detection (Chapter 3), and cancer model selection (Chapter 4) in high-throughput omics data analysis. In Chapter 2, we proposed Constrained Gaussian Mixture Model (CGMM) by extending the Gaussian mixture model (GMM) to solve empty or small cluster issue. We also generalized CGMM to sparse CGMM (SCGMM) using L1 penalty for gene selection. Extensive simulations and three real applications demonstrated the superior performance of our proposed method. In Chapter 3, we proposed a two-step framework, Multi-Study Multi-Class Concordance (MSCC), to detect biomarkers in multi-class analysis across multiple studies from the aspect of information theory. We first detect biomarkers with partially shared concordant patterns across multiple studies and then identify the studies which contribute to such concordance. The simulation and four real-world data applications showed superiority over min-MCC, the only existing method for this problem so far. In Chapter 4, we developed Congruence Analysis and Selection of CAncer Models (CASCAM), a statistical and machine learning framework for authenticating and selecting the most representative cancer models in pathway-specific and drug-relevant manner using transcriptomics data. CASCAM provides harmonization between tumor and cancer model omics data, interpretable machine learning for congruence quantification, mechanistic investigation, and pathway-based topological visualization to determine the most appropriate cancer model selection. The workflow is presented using invasive lobular breast carcinoma (ILC) subtype, credentialing highly relevant models. Our novel method is generalizable to any cancer subtype and will be impactful for furthering research in precision medicine. Contribution to public health: Share
Details
Available Versions of this Item
MetricsMonthly Views for the past 3 yearsPlum AnalyticsActions (login required)
|