Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Statistical Issues in Combining Multiple Genomic Studies: Quality Assessment, Dimension Reduction and Integration of Transcriptomic and Phenomic Data

Kang, Dongwan Don (2011) Statistical Issues in Combining Multiple Genomic Studies: Quality Assessment, Dimension Reduction and Integration of Transcriptomic and Phenomic Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (8MB) | Preview


Genomic meta-analysis has been applied to many biological problems to gain more power from increased sample sizes and to validate the result from an individual study. As for the study selection criteria, however, most literatures depend on qualitative or ad-hoc numerical methods, and there has not been an effort to develop a rigorous quantitative evaluation framework. In this thesis, we proposed several quantitative measures to assess the quality of a study for a meta-analysis. We have applied the proposed integrative criteria to multiple microarray studies to screen out inappropriate studies and also confirmed the necessity of proper exclusion criteria using real meta-analyses. By simulation studies, we showed the effectiveness and robustness of the proposed criteria. Secondly, we have investigated simultaneous dimension reduction frameworks for down-stream genomic meta-analysis. Currently, most microarray meta-analyses focus on detecting biomarkers; however, it is also valuable to seek a possibility of meta-analysis in unsupervised or supervised machine learning, particularly dimension reduction when multiple studies are combined. We proposed several simultaneous dimension reduction methods using principal component analysis (PCA). Using five examples of real microarray data, we showed the information gain obtained by adopting our proposed procedures in terms of better visualization and prediction accuracy. In the third component, we pursued a novel approach to elucidate undefined disease phenotypes between interstitial lung disease (ILD) or chronic obstructive pulmonary disease (COPD). By applying unsupervised learning technique to both clinical phenotypes and gene expression data obtained from well characterized large number of cohort, we successfully showed the existence of intermediate phenotypic group who have both disease characteristics and divergent phenotypes in clinical and molecular features. Public health importance of our findings is that we showed current clinical definitions and classification do not account for the large number of patients having intermediate phenotypes or less common features that are often excluded from clinical trials and epidemiology reports.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Kang, Dongwan
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTseng, Georgectseng@pitt.eduCTSENG
Committee MemberWeissfeld, Lisalweis@pitt.eduLWEIS
Committee MemberBarmada, Michaelbarmada@pitt.eduBARMADA
Committee MemberKaminski, Naftalikaminx@UPMC.EDU
Date: 22 September 2011
Date Type: Completion
Defense Date: 20 May 2011
Approval Date: 22 September 2011
Submission Date: 7 June 2011
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Quality Control; PCA; Dimension Reduction; Microar
Other ID:, etd-06072011-145031
Date Deposited: 10 Nov 2011 19:46
Last Modified: 15 Nov 2016 13:44


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item