Clustering, Biomarker and Cancer Model Selection Using Omics Data

Zou, Jian (2023) Clustering, Biomarker and Cancer Model Selection Using Omics Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Preview

PDF
Download (11MB) | Preview

Abstract

Central dogma reforms biomedical science. Since then, biomedical researchers have focused mostly on the relationship between DNA, RNA, and protein. To quantify their sequence, structure, and abundance, numerous biotechnologies have been created. High-throughput technologies, which emerged since 2000s, offer researchers a fantastic opportunity to thoroughly grasp the mechanism of diseases and also bring many statistical challenges. This proposal focuses on constrained clustering (Chapter 2), multi-study multi-class concordant biomarker detection (Chapter 3), and cancer model selection (Chapter 4) in high-throughput omics data analysis.

In Chapter 2, we proposed Constrained Gaussian Mixture Model (CGMM) by extending the Gaussian mixture model (GMM) to solve empty or small cluster issue. We also generalized CGMM to sparse CGMM (SCGMM) using L1 penalty for gene selection. Extensive simulations and three real applications demonstrated the superior performance of our proposed method.

In Chapter 3, we proposed a two-step framework, Multi-Study Multi-Class Concordance (MSCC), to detect biomarkers in multi-class analysis across multiple studies from the aspect of information theory. We first detect biomarkers with partially shared concordant patterns across multiple studies and then identify the studies which contribute to such concordance. The simulation and four real-world data applications showed superiority over min-MCC, the only existing method for this problem so far.

In Chapter 4, we developed Congruence Analysis and Selection of CAncer Models (CASCAM), a statistical and machine learning framework for authenticating and selecting the most representative cancer models in pathway-specific and drug-relevant manner using transcriptomics data. CASCAM provides harmonization between tumor and cancer model omics data, interpretable machine learning for congruence quantification, mechanistic investigation, and pathway-based topological visualization to determine the most appropriate cancer model selection. The workflow is presented using invasive lobular breast carcinoma (ILC) subtype, credentialing highly relevant models. Our novel method is generalizable to any cancer subtype and will be impactful for furthering research in precision medicine.

Contribution to public health:
The proposed clustering, biomarker and cancer model selection methods using omics data are crucial for disease mechanistic understanding that can lead to translational and clinical research. The related researches unravel knowledge towards precision medicine and benefit public health.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Zou, Jian	jian.zou@pitt.edu	jiz179	0009-0006-9624-6487

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Tseng, George C.	ctseng@pitt.edu	ctseng
Committee Member	Wang, Jiebiao	JBWANG@pitt.edu	jbwang
Committee Member	Chen, Wei	wei.chen@chp.edu	wec47
Committee Member	Lee, Adrian V.	leeav@upmc.edu	avl10

Date:

9 May 2023

Date Type:

Publication

Defense Date:

24 April 2023

Approval Date:

9 May 2023

Submission Date:

17 April 2023

Access Restriction:

2 year -- Restrict access to University of Pittsburgh for a period of 2 years.

Number of Pages:

126

Institution:

University of Pittsburgh

Schools and Programs:

School of Public Health > Biostatistics

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Cancer Biology, Machine Learning, Precision Medicine

Date Deposited:

10 May 2023 02:33

Last Modified:

09 May 2025 12:15

URI:

http://d-scholarship.pitt.edu/id/eprint/44689

Available Versions of this Item

Clustering, Biomarker and Cancer Model Selection Using Omics Data. (deposited 10 May 2023 02:33) [Currently Displayed]

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Clustering, Biomarker and Cancer Model Selection Using Omics Data

Abstract

Share

Details

Available Versions of this Item

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds