Pitt Logo LinkContact Us

Statistical Issues in Combining Multiple Genomic Studies: Quality Assessment, Dimension Reduction and Integration of Transcriptomic and Phenomic Data

Kang, Dongwan Don (2011) Statistical Issues in Combining Multiple Genomic Studies: Quality Assessment, Dimension Reduction and Integration of Transcriptomic and Phenomic Data. Doctoral Dissertation, University of Pittsburgh.

[img] PDF - Primary Text
Restricted to University of Pittsburgh users only until 22 September 2016.

Download (8Mb) | Request a copy

    Abstract

    Genomic meta-analysis has been applied to many biological problems to gain more power from increased sample sizes and to validate the result from an individual study. As for the study selection criteria, however, most literatures depend on qualitative or ad-hoc numerical methods, and there has not been an effort to develop a rigorous quantitative evaluation framework. In this thesis, we proposed several quantitative measures to assess the quality of a study for a meta-analysis. We have applied the proposed integrative criteria to multiple microarray studies to screen out inappropriate studies and also confirmed the necessity of proper exclusion criteria using real meta-analyses. By simulation studies, we showed the effectiveness and robustness of the proposed criteria. Secondly, we have investigated simultaneous dimension reduction frameworks for down-stream genomic meta-analysis. Currently, most microarray meta-analyses focus on detecting biomarkers; however, it is also valuable to seek a possibility of meta-analysis in unsupervised or supervised machine learning, particularly dimension reduction when multiple studies are combined. We proposed several simultaneous dimension reduction methods using principal component analysis (PCA). Using five examples of real microarray data, we showed the information gain obtained by adopting our proposed procedures in terms of better visualization and prediction accuracy. In the third component, we pursued a novel approach to elucidate undefined disease phenotypes between interstitial lung disease (ILD) or chronic obstructive pulmonary disease (COPD). By applying unsupervised learning technique to both clinical phenotypes and gene expression data obtained from well characterized large number of cohort, we successfully showed the existence of intermediate phenotypic group who have both disease characteristics and divergent phenotypes in clinical and molecular features. Public health importance of our findings is that we showed current clinical definitions and classification do not account for the large number of patients having intermediate phenotypes or less common features that are often excluded from clinical trials and epidemiology reports.


    Share

    Citation/Export:
    Social Networking:

    Details

    Item Type: University of Pittsburgh ETD
    Creators/Authors:
    CreatorsEmailORCID
    Kang, Dongwan Dondonkang75@gmail.com
    ETD Committee:
    ETD Committee TypeCommittee MemberEmailORCID
    Committee ChairTseng, Georgectseng@pitt.edu
    Committee MemberWeissfeld, Lisalweis@pitt.edu
    Committee MemberBarmada, Michaelbarmada@pitt.edu
    Committee MemberKaminski, Naftalikaminx@UPMC.EDU
    Title: Statistical Issues in Combining Multiple Genomic Studies: Quality Assessment, Dimension Reduction and Integration of Transcriptomic and Phenomic Data
    Status: Unpublished
    Abstract: Genomic meta-analysis has been applied to many biological problems to gain more power from increased sample sizes and to validate the result from an individual study. As for the study selection criteria, however, most literatures depend on qualitative or ad-hoc numerical methods, and there has not been an effort to develop a rigorous quantitative evaluation framework. In this thesis, we proposed several quantitative measures to assess the quality of a study for a meta-analysis. We have applied the proposed integrative criteria to multiple microarray studies to screen out inappropriate studies and also confirmed the necessity of proper exclusion criteria using real meta-analyses. By simulation studies, we showed the effectiveness and robustness of the proposed criteria. Secondly, we have investigated simultaneous dimension reduction frameworks for down-stream genomic meta-analysis. Currently, most microarray meta-analyses focus on detecting biomarkers; however, it is also valuable to seek a possibility of meta-analysis in unsupervised or supervised machine learning, particularly dimension reduction when multiple studies are combined. We proposed several simultaneous dimension reduction methods using principal component analysis (PCA). Using five examples of real microarray data, we showed the information gain obtained by adopting our proposed procedures in terms of better visualization and prediction accuracy. In the third component, we pursued a novel approach to elucidate undefined disease phenotypes between interstitial lung disease (ILD) or chronic obstructive pulmonary disease (COPD). By applying unsupervised learning technique to both clinical phenotypes and gene expression data obtained from well characterized large number of cohort, we successfully showed the existence of intermediate phenotypic group who have both disease characteristics and divergent phenotypes in clinical and molecular features. Public health importance of our findings is that we showed current clinical definitions and classification do not account for the large number of patients having intermediate phenotypes or less common features that are often excluded from clinical trials and epidemiology reports.
    Date: 22 September 2011
    Date Type: Completion
    Defense Date: 20 May 2011
    Approval Date: 22 September 2011
    Submission Date: 07 June 2011
    Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
    Patent pending: No
    Institution: University of Pittsburgh
    Thesis Type: Doctoral Dissertation
    Refereed: Yes
    Degree: PhD - Doctor of Philosophy
    URN: etd-06072011-145031
    Uncontrolled Keywords: Quality Control; PCA; Dimension Reduction; Microar
    Schools and Programs: Graduate School of Public Health > Biostatistics
    Date Deposited: 10 Nov 2011 14:46
    Last Modified: 20 Jan 2012 14:12
    Other ID: http://etd.library.pitt.edu/ETD/available/etd-06072011-145031/, etd-06072011-145031

    Actions (login required)

    View Item

    Document Downloads