Integration and missing data handling in multiple omics studies

Fang, Zhou (2018) Integration and missing data handling in multiple omics studies. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Submitted Version
Download (1MB) | Preview

Abstract

In modern multiple omics high-throughput data analysis, data integration and missingness data handling are common problems in discovering regulatory mechanisms associated with complex diseases and boosting power and accuracy. Moreover, in genotyping problem, the integration of linkage disequilibrium (LD) and identity-by-descent (IBD) information becomes essential to reach universal superior performance. In pathway analysis, when multiple studies of different conditions are jointly analyzed, simultaneous discovery of differential and consensual pathways is valuable for knowledge discovery. This dissertation focuses on the development of a Bayesian multi-omics data integration model with missingness handling, a novel genotype imputation methods incorporating both LD and IBD information, and a comparative pathway analysis integration method.

In the first paper of this dissertation, inspired by the popular Integrative Bayesian Analysis of Genomics data (iBAG), we propose a full Bayesian model that allows incorporation of samples with missing omics data as well as a self-learning cross-validation (CV) decision scheme. Simulations and a real application on child asthma dataset demonstrate superior performance of the CV decision scheme when various types of missing mechanisms are evaluated.

In the second paper, we propose a novel genotype inference method, namely LDIV, to integrate both LD and IBD information. To evaluate our approach, we simulated individuals in different family structures and sequencing depth. Results showed that LDIV could significantly increase the genotype accuracy for family sequencing data.

The third paper presents a meta-analytic integration tool, Comparative Pathway Integrator (CPI), to discover consensual and differential enrichment patterns, reduce pathway redundancy, and assist explanation of the pathway clusters with a novel text mining algorithm. We applied CPI to jointly analyze six psychiatric disorder transcriptomic studies to demonstrate its effectiveness, and found functions confirmed by previous biological studies and novel enrichment patterns.

All three projects could have substantial public health importance. The proposed method in the first paper can be used for disease-causing biomarker selection and disease risk prediction. By handling missing data, a higher statistical power and accuracy in clinical prediction and biomarker selection can be retained given fixed budget and sample size. LDIV effectively increases genotyping accuracy. CPI is helpful to simultaneously discover the biological processes that function differentially and consensually across studies. And it will also assist scientists to explore pathway findings with reduced redundancy and more statistical backup.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Fang, Zhou	zhf9@pitt.edu	zhf9

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Tseng, George C.	ctseng@pitt.edu	ctseng
Committee CoChair	Chen, Wei	wec47@pitt.edu	wec47
Committee Member	Tang, Gong	got1@pitt.edu	got1
Committee Member	Ding, Ying	yingding@pitt.edu	yingding
Committee Member	Hu, Ming	afhuming@gmail.com	non-pitt

Date:

17 September 2018

Date Type:

Publication

Defense Date:

3 May 2018

Approval Date:

17 September 2018

Submission Date:

4 June 2018

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Number of Pages:

123

Institution:

University of Pittsburgh

Schools and Programs:

School of Public Health > Biostatistics

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Multiple Omics Integration; Bayesian Imputation; Genotyping

Date Deposited:

17 Sep 2018 20:47

Last Modified:

17 Sep 2018 20:47

URI:

http://d-scholarship.pitt.edu/id/eprint/34596

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Integration and missing data handling in multiple omics studies

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds