Integration and missing data handling in multiple omics studiesFang, Zhou (2018) Integration and missing data handling in multiple omics studies. Doctoral Dissertation, University of Pittsburgh. (Unpublished)
AbstractIn modern multiple omics high-throughput data analysis, data integration and missingness data handling are common problems in discovering regulatory mechanisms associated with complex diseases and boosting power and accuracy. Moreover, in genotyping problem, the integration of linkage disequilibrium (LD) and identity-by-descent (IBD) information becomes essential to reach universal superior performance. In pathway analysis, when multiple studies of different conditions are jointly analyzed, simultaneous discovery of differential and consensual pathways is valuable for knowledge discovery. This dissertation focuses on the development of a Bayesian multi-omics data integration model with missingness handling, a novel genotype imputation methods incorporating both LD and IBD information, and a comparative pathway analysis integration method. In the first paper of this dissertation, inspired by the popular Integrative Bayesian Analysis of Genomics data (iBAG), we propose a full Bayesian model that allows incorporation of samples with missing omics data as well as a self-learning cross-validation (CV) decision scheme. Simulations and a real application on child asthma dataset demonstrate superior performance of the CV decision scheme when various types of missing mechanisms are evaluated. In the second paper, we propose a novel genotype inference method, namely LDIV, to integrate both LD and IBD information. To evaluate our approach, we simulated individuals in different family structures and sequencing depth. Results showed that LDIV could significantly increase the genotype accuracy for family sequencing data. The third paper presents a meta-analytic integration tool, Comparative Pathway Integrator (CPI), to discover consensual and differential enrichment patterns, reduce pathway redundancy, and assist explanation of the pathway clusters with a novel text mining algorithm. We applied CPI to jointly analyze six psychiatric disorder transcriptomic studies to demonstrate its effectiveness, and found functions confirmed by previous biological studies and novel enrichment patterns. Share
Details
MetricsMonthly Views for the past 3 yearsPlum AnalyticsActions (login required)
|