Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

STATISTICAL ISSUES IN META-ANALYSIS FOR IDENTIFYING SIGNATURE GENES IN THE INTEGRATION OF MULTIPLE GENOMIC STUDIES

Li, Jia (2009) STATISTICAL ISSUES IN META-ANALYSIS FOR IDENTIFYING SIGNATURE GENES IN THE INTEGRATION OF MULTIPLE GENOMIC STUDIES. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Primary Text

Download (2MB) | Preview

Abstract

With the availability of tons of expression profiles, the need for meta-analyses to integratedifferent types of microarray data are obvious. For detection of differentially expressed genes,most of the current efforts are focused on comparing and evaluating gene lists obtained fromeach individual dataset. Several statistical meta-analysis methods, including Fisher's methodand the random effects model, have been proposed but the statistcal framework is not oftenrigorously formulated for evaluation and comparison. In this dissertation, we attempt toformulate meta-analysis in genomic studies and develop systematic integration methods fortwo-class studies and multi-class studies.First, we tackle two often-asked biological questions: "Which genes are significant in oneor more data sets?" and "Which genes are significant in all data sets?". We illustrate twostatistical hypothesis settings and propose an optimally weighted statistic and compare toclassical Fisher's equally weighted statistic and Tippett's minimum p-value statistic. Gener-ally there exists no uniformly most powerful test and we show that all of the three methodsare admissible under simplified Gaussian assumptions. Furthermore, the optimally weightedstatistic maintains advantages of the two classical methods and consistently performs wellwhen the two methods perform poorly in respective extreme alternative hypotheses. Theoptimal weights provide natural categorization of the detected genes to facilitate further bio-logical investigation. We demonstrate the comparison and advantages of optimally weightedstatistic by power analysis, simulations and two real data analyses of combining multi-tissue energy metabolism mouse data sets and prostate cancer data sets.Second, we propose two methods for identifying biomarkers of concordant patterns acrossstudies, when there are more than two classes in each study. So far, published meta-analysismethods for this purpose mostly consider two-class comparison. Methods for combiningmulti-class studies and pattern concordance are rarely explored. We first consider a naturalextension of combining p-values from the traditional ANOVA model. Since p-values fromANOVA do not reflect pattern information, we propose a multi-class correlation measure(MCC) under equal-weight bivariate mixture model to specifically seek for biomarkers ofconcordant patterns across a pair of studies. For both approaches, we focus to identifybiomarkers differentially expressed in all studies (ANOVA-maxP, min-MCC). Both ANOVA-maxP and min-MCC are evaluated by simulation studies and by applications to a multi-tissuemouse metabolism data set and a multi-platform mouse trauma data set.Finally, we develop a "genomeMeta" R package. genomeMeta produces visualization and summarization of biomarkers identified by methods that we describe and propose in thisdissertation.This work could improve public health by providing more effective methodologies forbiomarker detection in the integration of multiple genomic studies.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Li, Jiajiajiaysc@gmail.com
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTseng, George Cctseng@pitt.eduCTSENG
Committee MemberWeissfeld, Lisalweis@pitt.eduLWEIS
Committee MemberMazumdar, Satimaz1@pitt.eduMAZ1
Committee MemberGopalakrishnan, Vanathivanathi@pitt.eduVANATHI
Date: 29 January 2009
Date Type: Completion
Defense Date: 11 September 2008
Approval Date: 29 January 2009
Submission Date: 25 November 2008
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Institution: University of Pittsburgh
Schools and Programs: Graduate School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: microarray; optimally-weighted statistic; meta-analysis; minimum multi-class correlation statistic
Other ID: http://etd.library.pitt.edu/ETD/available/etd-11252008-234338/, etd-11252008-234338
Date Deposited: 10 Nov 2011 20:06
Last Modified: 15 Nov 2016 13:52
URI: http://d-scholarship.pitt.edu/id/eprint/9799

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item