Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Clustering, Biomarker and Cancer Model Selection Using Omics Data

Zou, Jian (2023) Clustering, Biomarker and Cancer Model Selection Using Omics Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

[img] PDF
Restricted to University of Pittsburgh users only until 9 May 2025.

Download (11MB) | Request a Copy


Central dogma reforms biomedical science. Since then, biomedical researchers have focused mostly on the relationship between DNA, RNA, and protein. To quantify their sequence, structure, and abundance, numerous biotechnologies have been created. High-throughput technologies, which emerged since 2000s, offer researchers a fantastic opportunity to thoroughly grasp the mechanism of diseases and also bring many statistical challenges. This proposal focuses on constrained clustering (Chapter 2), multi-study multi-class concordant biomarker detection (Chapter 3), and cancer model selection (Chapter 4) in high-throughput omics data analysis.

In Chapter 2, we proposed Constrained Gaussian Mixture Model (CGMM) by extending the Gaussian mixture model (GMM) to solve empty or small cluster issue. We also generalized CGMM to sparse CGMM (SCGMM) using L1 penalty for gene selection. Extensive simulations and three real applications demonstrated the superior performance of our proposed method.

In Chapter 3, we proposed a two-step framework, Multi-Study Multi-Class Concordance (MSCC), to detect biomarkers in multi-class analysis across multiple studies from the aspect of information theory. We first detect biomarkers with partially shared concordant patterns across multiple studies and then identify the studies which contribute to such concordance. The simulation and four real-world data applications showed superiority over min-MCC, the only existing method for this problem so far.

In Chapter 4, we developed Congruence Analysis and Selection of CAncer Models (CASCAM), a statistical and machine learning framework for authenticating and selecting the most representative cancer models in pathway-specific and drug-relevant manner using transcriptomics data. CASCAM provides harmonization between tumor and cancer model omics data, interpretable machine learning for congruence quantification, mechanistic investigation, and pathway-based topological visualization to determine the most appropriate cancer model selection. The workflow is presented using invasive lobular breast carcinoma (ILC) subtype, credentialing highly relevant models. Our novel method is generalizable to any cancer subtype and will be impactful for furthering research in precision medicine.

Contribution to public health:
The proposed clustering, biomarker and cancer model selection methods using omics data are crucial for disease mechanistic understanding that can lead to translational and clinical research. The related researches unravel knowledge towards precision medicine and benefit public health.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Zou, Jianjian.zou@pitt.edujiz1790009-0006-9624-6487
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTseng, George C.ctseng@pitt.eductseng
Committee MemberWang, JiebiaoJBWANG@pitt.edujbwang
Committee MemberChen, Weiwei.chen@chp.eduwec47
Committee MemberLee, Adrian V.leeav@upmc.eduavl10
Date: 9 May 2023
Date Type: Publication
Defense Date: 24 April 2023
Approval Date: 9 May 2023
Submission Date: 17 April 2023
Access Restriction: 2 year -- Restrict access to University of Pittsburgh for a period of 2 years.
Number of Pages: 126
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Cancer Biology, Machine Learning, Precision Medicine
Date Deposited: 10 May 2023 02:33
Last Modified: 10 May 2023 02:33

Available Versions of this Item

  • Clustering, Biomarker and Cancer Model Selection Using Omics Data. (deposited 10 May 2023 02:33) [Currently Displayed]


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item