Kang, Dongwan Don (2011) Statistical Issues in Combining Multiple Genomic Studies: Quality Assessment, Dimension Reduction and Integration of Transcriptomic and Phenomic Data. Doctoral Dissertation, University of Pittsburgh.
Abstract
Genomic meta-analysis has been applied to many biological problems to gain more power from increased sample sizes and to validate the result from an individual study. As for the study selection criteria, however, most literatures depend on qualitative or ad-hoc numerical methods, and there has not been an effort to develop a rigorous quantitative evaluation framework. In this thesis, we proposed several quantitative measures to assess the quality of a study for a meta-analysis. We have applied the proposed integrative criteria to multiple microarray studies to screen out inappropriate studies and also confirmed the necessity of proper exclusion criteria using real meta-analyses. By simulation studies, we showed the effectiveness and robustness of the proposed criteria. Secondly, we have investigated simultaneous dimension reduction frameworks for down-stream genomic meta-analysis. Currently, most microarray meta-analyses focus on detecting biomarkers; however, it is also valuable to seek a possibility of meta-analysis in unsupervised or supervised machine learning, particularly dimension reduction when multiple studies are combined. We proposed several simultaneous dimension reduction methods using principal component analysis (PCA). Using five examples of real microarray data, we showed the information gain obtained by adopting our proposed procedures in terms of better visualization and prediction accuracy. In the third component, we pursued a novel approach to elucidate undefined disease phenotypes between interstitial lung disease (ILD) or chronic obstructive pulmonary disease (COPD). By applying unsupervised learning technique to both clinical phenotypes and gene expression data obtained from well characterized large number of cohort, we successfully showed the existence of intermediate phenotypic group who have both disease characteristics and divergent phenotypes in clinical and molecular features. Public health importance of our findings is that we showed current clinical definitions and classification do not account for the large number of patients having intermediate phenotypes or less common features that are often excluded from clinical trials and epidemiology reports.
Share |
| Citation/Export: | |
| Social Networking: | |
|---|
Details |
| Item Type: | University of Pittsburgh ETD |
| ETD Committee: | | ETD Committee Type | Committee Member | Email |
|---|
| Committee Chair | Tseng, George | ctseng@pitt.edu | | Committee Member | Weissfeld, Lisa | lweis@pitt.edu | | Committee Member | Barmada, Michael | barmada@pitt.edu | | Committee Member | Kaminski, Naftali | kaminx@UPMC.EDU |
|
| Title: | Statistical Issues in Combining Multiple Genomic Studies: Quality Assessment, Dimension Reduction and Integration of Transcriptomic and Phenomic Data |
| Status: | Unpublished |
| Abstract: | Genomic meta-analysis has been applied to many biological problems to gain more power from increased sample sizes and to validate the result from an individual study. As for the study selection criteria, however, most literatures depend on qualitative or ad-hoc numerical methods, and there has not been an effort to develop a rigorous quantitative evaluation framework. In this thesis, we proposed several quantitative measures to assess the quality of a study for a meta-analysis. We have applied the proposed integrative criteria to multiple microarray studies to screen out inappropriate studies and also confirmed the necessity of proper exclusion criteria using real meta-analyses. By simulation studies, we showed the effectiveness and robustness of the proposed criteria. Secondly, we have investigated simultaneous dimension reduction frameworks for down-stream genomic meta-analysis. Currently, most microarray meta-analyses focus on detecting biomarkers; however, it is also valuable to seek a possibility of meta-analysis in unsupervised or supervised machine learning, particularly dimension reduction when multiple studies are combined. We proposed several simultaneous dimension reduction methods using principal component analysis (PCA). Using five examples of real microarray data, we showed the information gain obtained by adopting our proposed procedures in terms of better visualization and prediction accuracy. In the third component, we pursued a novel approach to elucidate undefined disease phenotypes between interstitial lung disease (ILD) or chronic obstructive pulmonary disease (COPD). By applying unsupervised learning technique to both clinical phenotypes and gene expression data obtained from well characterized large number of cohort, we successfully showed the existence of intermediate phenotypic group who have both disease characteristics and divergent phenotypes in clinical and molecular features. Public health importance of our findings is that we showed current clinical definitions and classification do not account for the large number of patients having intermediate phenotypes or less common features that are often excluded from clinical trials and epidemiology reports. |
| Date: | 22 September 2011 |
| Date Type: | Completion |
| Defense Date: | 20 May 2011 |
| Approval Date: | 22 September 2011 |
| Submission Date: | 07 June 2011 |
| Access Restriction: | 5 year -- Restrict access to University of Pittsburgh for a period of 5 years. |
| Patent pending: | No |
| Institution: | University of Pittsburgh |
| Thesis Type: | Doctoral Dissertation |
| Refereed: | Yes |
| Degree: | PhD - Doctor of Philosophy |
| URN: | etd-06072011-145031 |
| Uncontrolled Keywords: | Quality Control; PCA; Dimension Reduction; Microar |
| Schools and Programs: | Graduate School of Public Health > Biostatistics |
| Date Deposited: | 10 Nov 2011 14:46 |
| Last Modified: | 20 Jan 2012 14:12 |
| Other ID: | http://etd.library.pitt.edu/ETD/available/etd-06072011-145031/, etd-06072011-145031 |
|---|
Actions (login required)