Huo, Zhiguang
(2017)
Statistical integrative omics methods for disease subtype discovery.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Disease phenotyping using omics data has become a popular approach that can poten-tially lead to better personalized treatment. Identifying disease subtypes via unsupervised machine learning is the first step towards this goal. With the accumulation of massive high-throughput omics data sets, omics data integration becomes essential to improve statistical power and reproducibility. In this dissertation, two directions from sparse K-means method will be extended.
The first extension is a meta-analytic framework to identify novel disease subtypes when expression profiles from multiple cohorts are available. The lasso regularization and meta-analysis can identify a unique set of gene features for subtype characterization. By adding pattern matching reward function, consistency of subtype signatures across studies can be achieved.
The second extension is using integrating multi-level omics datasets by incorporating prior biological knowledge using sparse overlapping group lasso approach. An algorithm using alternating direction method of multiplier (ADMM) will be applied for fast optimization.
For both topics, simulation and real applications in breast cancer and leukemia will show the superior clustering accuracy, feature selection and functional annotation. These methods will improved statistical power, prediction accuracy and reproducibility of disease subtype discovery analysis.
Contribution to public health: The proposed methods are able to identify disease subtypes from complex multi-level or multi-cohort omics data. Disease subtype definition is essential to deliver personalized medicine, since treating different subtypes by its most appropriate medicine will achieve the most effective treatment effect and eliminate side effect. Omics data itself can provide better definition of disease subtypes than regular pathological approaches. By multi-level or multi-cohort omics data, we are able to gain statistical power and reproducibility, and the resulting subtype definition is much reliable, convincing and reproducible than single study analysis.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
29 June 2017 |
Date Type: |
Publication |
Defense Date: |
30 March 2017 |
Approval Date: |
29 June 2017 |
Submission Date: |
31 March 2017 |
Access Restriction: |
1 year -- Restrict access to University of Pittsburgh for a period of 1 year. |
Number of Pages: |
96 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
integrative omics methods, disease subtype discovery |
Date Deposited: |
29 Jun 2017 23:42 |
Last Modified: |
01 May 2019 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/31110 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |