Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Statistical integrative omics methods for disease subtype discovery

Huo, Zhiguang (2017) Statistical integrative omics methods for disease subtype discovery. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Submitted Version

Download (2MB) | Preview


Disease phenotyping using omics data has become a popular approach that can poten-tially lead to better personalized treatment. Identifying disease subtypes via unsupervised machine learning is the first step towards this goal. With the accumulation of massive high-throughput omics data sets, omics data integration becomes essential to improve statistical power and reproducibility. In this dissertation, two directions from sparse K-means method will be extended.
The first extension is a meta-analytic framework to identify novel disease subtypes when expression profiles from multiple cohorts are available. The lasso regularization and meta-analysis can identify a unique set of gene features for subtype characterization. By adding pattern matching reward function, consistency of subtype signatures across studies can be achieved.
The second extension is using integrating multi-level omics datasets by incorporating prior biological knowledge using sparse overlapping group lasso approach. An algorithm using alternating direction method of multiplier (ADMM) will be applied for fast optimization.
For both topics, simulation and real applications in breast cancer and leukemia will show the superior clustering accuracy, feature selection and functional annotation. These methods will improved statistical power, prediction accuracy and reproducibility of disease subtype discovery analysis.
Contribution to public health: The proposed methods are able to identify disease subtypes from complex multi-level or multi-cohort omics data. Disease subtype definition is essential to deliver personalized medicine, since treating different subtypes by its most appropriate medicine will achieve the most effective treatment effect and eliminate side effect. Omics data itself can provide better definition of disease subtypes than regular pathological approaches. By multi-level or multi-cohort omics data, we are able to gain statistical power and reproducibility, and the resulting subtype definition is much reliable, convincing and reproducible than single study analysis.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Huo, Zhiguangxiaoguang1988@gmail.comzhh180000-0002-8032-4392
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTseng,
Committee MemberPark,
Committee MemberWahed,
Committee MemberAnderson,
Committee MemberRen,
Date: 29 June 2017
Date Type: Publication
Defense Date: 30 March 2017
Approval Date: 29 June 2017
Submission Date: 31 March 2017
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 96
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: integrative omics methods, disease subtype discovery
Date Deposited: 29 Jun 2017 23:42
Last Modified: 01 May 2019 05:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item