Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Investigations on genomic meta-analysis: imputation for incomplete data and properties of adaptively weighted fisher's method

Tang, Shaowu (2014) Investigations on genomic meta-analysis: imputation for incomplete data and properties of adaptively weighted fisher's method. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Submitted Version

Download (1MB) | Preview


Microarray analysis to monitor expression activities in thousands of genes simultaneously has become a routine experiment in biomedical research during the past decade. The microarray expression data generated by high throughput experiments may consist thousands variables and therefore pose great challenges to the researchers in a wide variety of objectives. A commonly encountered problem by researchers is to detect genes differentially expressed between two or more conditions and is the major concern of this thesis.
In the first part of the thesis, we consider imputation of incomplete data in transcriptomic meta-analysis. In the past decade, a tremendous amount of expression profiles are generated and stored in the public domain and information integration by meta-analysis to detect differentially expressed (DE) genes has become popular to obtain increased statistical power and validated findings. Methods that combine p-values have been widely used in such a genomic setting, among which the Fisher's,Stouffer's, minP and maxP methods are the most popular ones. In practice, raw data or p-values of DE evidence of the entire genome are often not available in genomic studies to be combined. Instead,
only the detected DE gene lists under certain p-value threshold (e.g. DE genes with p-value< 0.001) are reported in journal publications. The truncated p-value information voided the aforementioned meta-analysis methods and researchers are forced to apply less efficient vote counting method or naively drop the studies with incomplete information. In the thesis, effective imputation methods
were derived for such situations with partially censored p-values. We developed and compared three imputation methods (mean imputation, single random imputation and multiple imputation) for a general class of evidence aggregation methods of which Fisher, Stouffer and logit methods are special examples. The null distribution of each method was analytically derived and subsequent inference and genomic analysis framework were established. Simulations were performed to investigate the type I error and power for univariate case and the control of false discovery rate (FDR) for (correlated) gene expression data. The proposed methods were also applied to several genomic applications in prostate cancer, major depressive disorder MDD), colorectal cancer and pain Research.
In the second part, we investigate statistical properties of adaptively weighted (AW) Fisher's method. The traditional Fisher's method assigns equal weights to each study, which are simple in nature but can not always achieve high power for a variety of alternative hypothesis settings. Intuitively more weights should be assigned to the studies with high power to detect the difference between different conditions. The AW-Fisher's method, where the best binary 0=1 weights were determined by minimizing the p-value of the weighted test statistics. By using the order statistics technique, the searching space for adaptive weights reduces to linear complexity instead of exponential, which reduced the computational complexity dramatically, and a close form was derived to compute the p-values for K = 2, and an importance sampling algorithm was proposed to evaluate the p-values for K>2. Some theoretical properties of the AW-Fisher's method such as consistency and asymptotical Bahadur optimality (ABO) have also been investigated. Simulations
will be performed to verify the asymptotical Bahadur optimality of the AW-Fisher and compare the performance of AW-Fisher and Fisher's methods.
Meta-analysis of multiple genomic studies increases the statistical power of biomarker detection and therefore the work in this thesis could improve public health by providing more effective methodologies for biomarker detection in the integration of multiple genomic studies when the information is incomplete or when different hypothesis settings are tested.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTseng, Georgectseng@pitt.eduCTSENG
Committee MemberWeeks, Daniel E.weeks@pitt.eduWEEKS
Committee MemberWahed, Abdus Swahed@pitt.eduWAHED
Committee MemberJeong, Jong-Hyeonjeong@nsabp.pitt.eduJJEONG
Committee MemberFeingold, Eleanorfeingold@pitt.eduFEINGOLD
Date: 27 June 2014
Date Type: Publication
Defense Date: 10 April 2014
Approval Date: 27 June 2014
Submission Date: 11 April 2014
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Number of Pages: 93
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: meta-analysis, imputation, asymptotically Bahadur optimality
Date Deposited: 27 Jun 2014 20:32
Last Modified: 01 May 2019 05:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item