Kang, Dongwan Don
(2011)
Statistical Issues in Combining Multiple Genomic Studies: Quality Assessment, Dimension Reduction and Integration of Transcriptomic and Phenomic Data.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Genomic meta-analysis has been applied to many biological problems to gain more power from increased sample sizes and to validate the result from an individual study. As for the study selection criteria, however, most literatures depend on qualitative or ad-hoc numerical methods, and there has not been an effort to develop a rigorous quantitative evaluation framework. In this thesis, we proposed several quantitative measures to assess the quality of a study for a meta-analysis. We have applied the proposed integrative criteria to multiple microarray studies to screen out inappropriate studies and also confirmed the necessity of proper exclusion criteria using real meta-analyses. By simulation studies, we showed the effectiveness and robustness of the proposed criteria. Secondly, we have investigated simultaneous dimension reduction frameworks for down-stream genomic meta-analysis. Currently, most microarray meta-analyses focus on detecting biomarkers; however, it is also valuable to seek a possibility of meta-analysis in unsupervised or supervised machine learning, particularly dimension reduction when multiple studies are combined. We proposed several simultaneous dimension reduction methods using principal component analysis (PCA). Using five examples of real microarray data, we showed the information gain obtained by adopting our proposed procedures in terms of better visualization and prediction accuracy. In the third component, we pursued a novel approach to elucidate undefined disease phenotypes between interstitial lung disease (ILD) or chronic obstructive pulmonary disease (COPD). By applying unsupervised learning technique to both clinical phenotypes and gene expression data obtained from well characterized large number of cohort, we successfully showed the existence of intermediate phenotypic group who have both disease characteristics and divergent phenotypes in clinical and molecular features. Public health importance of our findings is that we showed current clinical definitions and classification do not account for the large number of patients having intermediate phenotypes or less common features that are often excluded from clinical trials and epidemiology reports.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
22 September 2011 |
Date Type: |
Completion |
Defense Date: |
20 May 2011 |
Approval Date: |
22 September 2011 |
Submission Date: |
7 June 2011 |
Access Restriction: |
5 year -- Restrict access to University of Pittsburgh for a period of 5 years. |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Quality Control; PCA; Dimension Reduction; Microar |
Other ID: |
http://etd.library.pitt.edu/ETD/available/etd-06072011-145031/, etd-06072011-145031 |
Date Deposited: |
10 Nov 2011 19:46 |
Last Modified: |
15 Nov 2016 13:44 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/8036 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |