Investigations on genomic meta-analysis: imputation for incomplete data and properties of adaptively weighted fisher's method

Tang, Shaowu (2014) Investigations on genomic meta-analysis: imputation for incomplete data and properties of adaptively weighted fisher's method. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Submitted Version
Download (1MB) | Preview

Abstract

Microarray analysis to monitor expression activities in thousands of genes simultaneously has become a routine experiment in biomedical research during the past decade. The microarray expression data generated by high throughput experiments may consist thousands variables and therefore pose great challenges to the researchers in a wide variety of objectives. A commonly encountered problem by researchers is to detect genes differentially expressed between two or more conditions and is the major concern of this thesis.
In the first part of the thesis, we consider imputation of incomplete data in transcriptomic meta-analysis. In the past decade, a tremendous amount of expression profiles are generated and stored in the public domain and information integration by meta-analysis to detect differentially expressed (DE) genes has become popular to obtain increased statistical power and validated findings. Methods that combine p-values have been widely used in such a genomic setting, among which the Fisher's,Stouffer's, minP and maxP methods are the most popular ones. In practice, raw data or p-values of DE evidence of the entire genome are often not available in genomic studies to be combined. Instead,
only the detected DE gene lists under certain p-value threshold (e.g. DE genes with p-value< 0.001) are reported in journal publications. The truncated p-value information voided the aforementioned meta-analysis methods and researchers are forced to apply less efficient vote counting method or naively drop the studies with incomplete information. In the thesis, effective imputation methods
were derived for such situations with partially censored p-values. We developed and compared three imputation methods (mean imputation, single random imputation and multiple imputation) for a general class of evidence aggregation methods of which Fisher, Stouffer and logit methods are special examples. The null distribution of each method was analytically derived and subsequent inference and genomic analysis framework were established. Simulations were performed to investigate the type I error and power for univariate case and the control of false discovery rate (FDR) for (correlated) gene expression data. The proposed methods were also applied to several genomic applications in prostate cancer, major depressive disorder MDD), colorectal cancer and pain Research.
In the second part, we investigate statistical properties of adaptively weighted (AW) Fisher's method. The traditional Fisher's method assigns equal weights to each study, which are simple in nature but can not always achieve high power for a variety of alternative hypothesis settings. Intuitively more weights should be assigned to the studies with high power to detect the difference between different conditions. The AW-Fisher's method, where the best binary 0=1 weights were determined by minimizing the p-value of the weighted test statistics. By using the order statistics technique, the searching space for adaptive weights reduces to linear complexity instead of exponential, which reduced the computational complexity dramatically, and a close form was derived to compute the p-values for K = 2, and an importance sampling algorithm was proposed to evaluate the p-values for K>2. Some theoretical properties of the AW-Fisher's method such as consistency and asymptotical Bahadur optimality (ABO) have also been investigated. Simulations
will be performed to verify the asymptotical Bahadur optimality of the AW-Fisher and compare the performance of AW-Fisher and Fisher's methods.
Meta-analysis of multiple genomic studies increases the statistical power of biomarker detection and therefore the work in this thesis could improve public health by providing more effective methodologies for biomarker detection in the integration of multiple genomic studies when the information is incomplete or when different hypothesis settings are tested.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Tang, Shaowu	shaowutang@gmail.com

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Tseng, George	ctseng@pitt.edu	CTSENG
Committee Member	Weeks, Daniel E.	weeks@pitt.edu	WEEKS
Committee Member	Wahed, Abdus S	wahed@pitt.edu	WAHED
Committee Member	Jeong, Jong-Hyeon	jeong@nsabp.pitt.edu	JJEONG
Committee Member	Feingold, Eleanor	feingold@pitt.edu	FEINGOLD

Date:

27 June 2014

Date Type:

Publication

Defense Date:

10 April 2014

Approval Date:

27 June 2014

Submission Date:

11 April 2014

Access Restriction:

5 year -- Restrict access to University of Pittsburgh for a period of 5 years.

Number of Pages:

Institution:

University of Pittsburgh

Schools and Programs:

School of Public Health > Biostatistics

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

meta-analysis, imputation, asymptotically Bahadur optimality

Date Deposited:

27 Jun 2014 20:32

Last Modified:

01 May 2019 05:15

URI:

http://d-scholarship.pitt.edu/id/eprint/21174

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Investigations on genomic meta-analysis: imputation for incomplete data and properties of adaptively weighted fisher's method

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds