Outcome-Guided Disease Subtyping and Power Calculation for High-Dimensional Omics StudiesLiu, Peng (2021) Outcome-Guided Disease Subtyping and Power Calculation for High-Dimensional Omics Studies. Doctoral Dissertation, University of Pittsburgh. (Unpublished)
AbstractWith the rapid advancement of high-throughput technologies, a large amount of high-dimensional data has been generated in the public domain, which gives rise to various statistical and computational challenges in the design and analysis of omics experiments. This proposal focuses on addressing disease subtyping (Chapters 2\&3) and power calculation issues (Chapter 4) in the analysis of high-dimensional omics studies. In Chapter 2, we proposed an outcome-guided disease subgrouping framework called ogClust. Disease subtyping by omics data usually applies conventional clustering methods, which primarily concerns identifying subpopulations with similar patterns in gene features. Since outcome information is not considered in clustering, the identified disease subtypes are often not associated with the outcome. ogClust uses a continuous or survival clinical outcome to guide disease subtypes, which identifies disease subtypes with their driving genes, and guarantees that the resulting subtypes are associated with disease of interest. In Chapter 3, we extended the ogClust model by integrating multi-omics data and incorporating biological information via the sparse overlapping group lasso to improve the accuracy and interpretability of feature selection and disease subtyping. An EM algorithm with alternating direction method of multiplier (ADMM) approach is applied for fast optimization. In Chapter 4, we proposed a power calculation and study design method ``MethylSeqDesign" for bisulfite DNA methylation sequencing (Methyl-Seq) studies. A three sequential steps power calculation method is designed to perform genome-wide power calculation and simultaneously consider sample size and sequencing depth. The performance of the method was evaluated with extensive simulations. Two real examples are analyzed to illustrate our approach. Contribution to public health: The methods proposed in Chapters 2&3 are useful for identifying outcome-associated clusters that are more likely to have distinct biological mechanisms or clinical significance, which is an essential first step towards precision medicine. The proposed method in Chapter 4 provides a useful tool to perform genome-wide power calculation and study design for Methyl-Seq studies. Share
Details
MetricsMonthly Views for the past 3 yearsPlum AnalyticsActions (login required)
|