Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Outcome-Guided Disease Subtyping and Power Calculation for High-Dimensional Omics Studies

Liu, Peng (2021) Outcome-Guided Disease Subtyping and Power Calculation for High-Dimensional Omics Studies. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (4MB) | Preview


With the rapid advancement of high-throughput technologies, a large amount of high-dimensional data has been generated in the public domain, which gives rise to various statistical and computational challenges in the design and analysis of omics experiments. This proposal focuses on addressing disease subtyping (Chapters 2\&3) and power calculation issues (Chapter 4) in the analysis of high-dimensional omics studies.

In Chapter 2, we proposed an outcome-guided disease subgrouping framework called ogClust. Disease subtyping by omics data usually applies conventional clustering methods, which primarily concerns identifying subpopulations with similar patterns in gene features. Since outcome information is not considered in clustering, the identified disease subtypes are often not associated with the outcome. ogClust uses a continuous or survival clinical outcome to guide disease subtypes, which identifies disease subtypes with their driving genes, and guarantees that the resulting subtypes are associated with disease of interest.

In Chapter 3, we extended the ogClust model by integrating multi-omics data and incorporating biological information via the sparse overlapping group lasso to improve the accuracy and interpretability of feature selection and disease subtyping. An EM algorithm with alternating direction method of multiplier (ADMM) approach is applied for fast optimization.

In Chapter 4, we proposed a power calculation and study design method ``MethylSeqDesign" for bisulfite DNA methylation sequencing (Methyl-Seq) studies. A three sequential steps power calculation method is designed to perform genome-wide power calculation and simultaneously consider sample size and sequencing depth. The performance of the method was evaluated with extensive simulations. Two real examples are analyzed to illustrate our approach.

Contribution to public health:

The methods proposed in Chapters 2&3 are useful for identifying outcome-associated clusters that are more likely to have distinct biological mechanisms or clinical significance, which is an essential first step towards precision medicine. The proposed method in Chapter 4 provides a useful tool to perform genome-wide power calculation and study design for Methyl-Seq studies.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Liu, Pengpel67@pitt.edupel670000-0003-4012-3884
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTseng, Georgectseng@pitt.eductseng
Committee CoChairTang, Lulutang@pitt.edulutang
Committee MemberPark, Yongseokyongpark@pitt.eduyongpark
Committee MemberWeeks, Danielweeks@pitt.eduweeks0000-0001-9410-7228
Date: 2 July 2021
Date Type: Publication
Defense Date: 17 June 2021
Approval Date: 2 July 2021
Submission Date: 26 June 2021
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 128
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: omics cluster analysis, cluster analysis, disease subtyping, variable selection, outcome association, precision medicine, integrative analysis, power calculation
Date Deposited: 02 Jul 2021 17:35
Last Modified: 02 Jul 2022 05:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item