Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Integration and missing data handling in multiple omics studies

Fang, Zhou (2018) Integration and missing data handling in multiple omics studies. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Submitted Version

Download (1MB) | Preview

Abstract

In modern multiple omics high-throughput data analysis, data integration and missingness data handling are common problems in discovering regulatory mechanisms associated with complex diseases and boosting power and accuracy. Moreover, in genotyping problem, the integration of linkage disequilibrium (LD) and identity-by-descent (IBD) information becomes essential to reach universal superior performance. In pathway analysis, when multiple studies of different conditions are jointly analyzed, simultaneous discovery of differential and consensual pathways is valuable for knowledge discovery. This dissertation focuses on the development of a Bayesian multi-omics data integration model with missingness handling, a novel genotype imputation methods incorporating both LD and IBD information, and a comparative pathway analysis integration method.

In the first paper of this dissertation, inspired by the popular Integrative Bayesian Analysis of Genomics data (iBAG), we propose a full Bayesian model that allows incorporation of samples with missing omics data as well as a self-learning cross-validation (CV) decision scheme. Simulations and a real application on child asthma dataset demonstrate superior performance of the CV decision scheme when various types of missing mechanisms are evaluated.

In the second paper, we propose a novel genotype inference method, namely LDIV, to integrate both LD and IBD information. To evaluate our approach, we simulated individuals in different family structures and sequencing depth. Results showed that LDIV could significantly increase the genotype accuracy for family sequencing data.

The third paper presents a meta-analytic integration tool, Comparative Pathway Integrator (CPI), to discover consensual and differential enrichment patterns, reduce pathway redundancy, and assist explanation of the pathway clusters with a novel text mining algorithm. We applied CPI to jointly analyze six psychiatric disorder transcriptomic studies to demonstrate its effectiveness, and found functions confirmed by previous biological studies and novel enrichment patterns.

All three projects could have substantial public health importance. The proposed method in the first paper can be used for disease-causing biomarker selection and disease risk prediction. By handling missing data, a higher statistical power and accuracy in clinical prediction and biomarker selection can be retained given fixed budget and sample size. LDIV effectively increases genotyping accuracy. CPI is helpful to simultaneously discover the biological processes that function differentially and consensually across studies. And it will also assist scientists to explore pathway findings with reduced redundancy and more statistical backup.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Fang, Zhouzhf9@pitt.eduzhf9
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTseng, George C.ctseng@pitt.eductseng
Committee CoChairChen, Weiwec47@pitt.eduwec47
Committee MemberTang, Gonggot1@pitt.edugot1
Committee MemberDing, Yingyingding@pitt.eduyingding
Committee MemberHu, Mingafhuming@gmail.comnon-pitt
Date: 17 September 2018
Date Type: Publication
Defense Date: 3 May 2018
Approval Date: 17 September 2018
Submission Date: 4 June 2018
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 123
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Multiple Omics Integration; Bayesian Imputation; Genotyping
Date Deposited: 17 Sep 2018 20:47
Last Modified: 17 Sep 2018 20:47
URI: http://d-scholarship.pitt.edu/id/eprint/34596

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item