Song, Chi
(2012)
Hypothesis settings and methods for gene expression meta-analysis.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
This is the latest version of this item.
Abstract
With the advent of high-throughput technologies, biomedical research has been dramatically reshaped in the past two decades. Technologies such as microarrays are broadly utilized to study the relationship between genomic alterations and disease outcomes. However, genomic analyses are criticized for their low reproducibility and generalizability. Large-scale meta-analysis of multiple studies is a timely and important issue with great public health significance, because robust biomarkers can be found for complex human diseases such as major depression disorder using meta-analysis techniques. Accurate marker detection will improve the disease diagnosis, treatment selection and prognosis prediction.
In this dissertation, I first illustrate different hypothesis settings for two different types of biomarkers: biomarkers that are differentially expressed (DE) “in all” studies and biomarkers that are DE “in any” studies. Then I propose a robust setting HSr to detect genes differentially expressed (DE) “in majority of” studies. For HSr, I propose an order statistic of p-values (rth order p-value, rOP) across combined studies as the test statistic. I also explore statistical properties such as power and asymptotic behavior of rOP. The method is applied to three examples to demonstrate its robustness and sensitivity. I develop two methods to guide the selection of r.
The non-complementary property of HSr causes anti-conservative inferences. To overcome this, I propose HS′r as a complementary form of HSr. For HS′r, the major obstacle comes from the mixture nature of the null distribution. From a Bayesian point of view, I propose a semiparametric mixture model for the observed p-values in combined studies. A Bayes factor is calculated based on the posterior distribution to substitute traditional hypothesis testing for HS′r. I also develop an expectation-maximization (EM) algorithm to fit this model. Simulation results and real data analysis show improved specificity and sensitivity of this novel approach compared to traditional methods.
Beyond meta-analysis of single genes, I also propose a framework to integrate multiple biological networks. A conservative subnetwork in a subset of datasets can be identified using our approach.
In conclusion, I discuss various interesting questions in genomic meta-analysis in this dissertation. And I provide a series of statistical tools to address them.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
29 June 2012 |
Date Type: |
Completion |
Defense Date: |
16 April 2012 |
Approval Date: |
29 June 2012 |
Submission Date: |
4 April 2012 |
Access Restriction: |
5 year -- Restrict access to University of Pittsburgh for a period of 5 years. |
Number of Pages: |
80 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Meta-analysis, Microarray, Genomics |
Date Deposited: |
29 Jun 2012 21:20 |
Last Modified: |
29 Jun 2017 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/11887 |
Available Versions of this Item
-
Hypothesis settings and methods for gene expression meta-analysis. (deposited 29 Jun 2012 21:20)
[Currently Displayed]
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |