Statistical integrative omics methods for disease subtype discovery

Huo, Zhiguang (2017) Statistical integrative omics methods for disease subtype discovery. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Submitted Version
Download (2MB) | Preview

Abstract

Disease phenotyping using omics data has become a popular approach that can poten-tially lead to better personalized treatment. Identifying disease subtypes via unsupervised machine learning is the ﬁrst step towards this goal. With the accumulation of massive high-throughput omics data sets, omics data integration becomes essential to improve statistical power and reproducibility. In this dissertation, two directions from sparse K-means method will be extended.
The ﬁrst extension is a meta-analytic framework to identify novel disease subtypes when expression proﬁles from multiple cohorts are available. The lasso regularization and meta-analysis can identify a unique set of gene features for subtype characterization. By adding pattern matching reward function, consistency of subtype signatures across studies can be achieved.
The second extension is using integrating multi-level omics datasets by incorporating prior biological knowledge using sparse overlapping group lasso approach. An algorithm using alternating direction method of multiplier (ADMM) will be applied for fast optimization.
For both topics, simulation and real applications in breast cancer and leukemia will show the superior clustering accuracy, feature selection and functional annotation. These methods will improved statistical power, prediction accuracy and reproducibility of disease subtype discovery analysis.
Contribution to public health: The proposed methods are able to identify disease subtypes from complex multi-level or multi-cohort omics data. Disease subtype deﬁnition is essential to deliver personalized medicine, since treating diﬀerent subtypes by its most appropriate medicine will achieve the most eﬀective treatment eﬀect and eliminate side eﬀect. Omics data itself can provide better deﬁnition of disease subtypes than regular pathological approaches. By multi-level or multi-cohort omics data, we are able to gain statistical power and reproducibility, and the resulting subtype deﬁnition is much reliable, convincing and reproducible than single study analysis.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Huo, Zhiguang	xiaoguang1988@gmail.com	zhh18	0000-0002-8032-4392

ETD Committee:

Title	Member	Email Address
Committee Chair	Tseng, George	ctseng@pitt.edu
Committee Member	Park, YongSeok	yongpark@pitt.edu
Committee Member	Wahed, Abdus	wahed@pitt.edu
Committee Member	Anderson, Stewart	sja@pitt.edu
Committee Member	Ren, Zhao	zren@pitt.edu

Date:

29 June 2017

Date Type:

Publication

Defense Date:

30 March 2017

Approval Date:

29 June 2017

Submission Date:

31 March 2017

Access Restriction:

1 year -- Restrict access to University of Pittsburgh for a period of 1 year.

Number of Pages:

Institution:

University of Pittsburgh

Schools and Programs:

School of Public Health > Biostatistics

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

integrative omics methods, disease subtype discovery

Date Deposited:

29 Jun 2017 23:42

Last Modified:

01 May 2019 05:15

URI:

http://d-scholarship.pitt.edu/id/eprint/31110

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Statistical integrative omics methods for disease subtype discovery

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds