Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Clustering and Association Analysis for High-Dimensional Omics Studies

Li, Yujia (2022) Clustering and Association Analysis for High-Dimensional Omics Studies. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img] PDF
Restricted to University of Pittsburgh users only until 10 May 2024.

Download (11MB) | Request a Copy

Abstract

With the rapid advancement of high-throughput technologies, a large amount of high-dimensional omics data has been generated in the public domain, which gives rise to various statistical and computational challenges in the cluster and association analysis of omics data. This dissertation focuses on estimation of tuning parameters in cluster analysis (Chapter 2), disease subtyping issues (Chapter 3) and association study between gene expression and multiple phenotypes (Chapter 4) in high-dimensional omics studies.

In Chapter 2, we proposed a resampling framework called S4 for selecting tuning parameters in cluster analysis by measuring the similarity (i.e., stability) between the clustering result of the whole and subsampled data. S4 can estimate number of clusters for $K$-means as well as estimate number of clusters and sparsity parameter simultaneously for sparse $K$-means. Extensive simulations and nine real applications demonstrate superior performance of our proposed S4 method.

In Chapter 3, we proposed a novel outcome-guided disease subtyping framework with weighted joint likelihood approach. Traditionally people utilize conventional cluster analysis (e.g., sparse K-means) to identify subgroups of patients with similar expression pattern, without consideration of outcome information. Therefore, the subgroups identified can be irrelevant to clinical outcome of interest. Our proposed method can solve this issue by incorporating outcome information in the cluster analysis, with good performance in both discovery and validation data.

In Chapter 4, we study association between gene expression and multiple correlated phenotypes in complex disease. We extend two P-value combination methods, adaptive weighted Fisher’s method (AFp) and adaptive Fisher’s method (AFz), to tackle this problem. Based on extensive evaluation, AFp is recommended. A real lung disease transcriptomic application demonstrates insightful biological findings of AFp.

Contribution to public health:
The methods proposed in chapter 2 are crucial for clustering analysis in high-dimensional omics data since tuning parameters can be critical for result and interpretation. Chapter 3 proposes a novel outcome-guided cluster analysis framework for disease subtyping. Chapter 4 provides a practical framework for analyzing association pattern between multiple phenotypes and gene expression in complex diseases. The proposed methods are all essential tools to uncover mechanism of diseases and develop efficient treatments towards precision medicine.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Li, Yujiayul178@pitt.eduyul1780000-0002-1024-9243
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTseng, Georgectseng@pitt.eductseng
Committee MemberDing, YingYINGDING@pitt.eduyingding
Committee MemberTang, LuLUTANG@pitt.edulutang
Committee MemberChen, Weiwei.chen@chp.edu
Date: 10 May 2022
Date Type: Publication
Defense Date: 8 April 2022
Approval Date: 10 May 2022
Submission Date: 12 April 2022
Access Restriction: 2 year -- Restrict access to University of Pittsburgh for a period of 2 years.
Number of Pages: 154
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: PhD dissertation, clustering analysis, association analysis, omics data.
Date Deposited: 10 May 2022 18:43
Last Modified: 10 May 2022 18:43
URI: http://d-scholarship.pitt.edu/id/eprint/42601

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item