Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Hypothesis settings and methods for gene expression meta-analysis

Song, Chi (2012) Hypothesis settings and methods for gene expression meta-analysis. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Primary Text

Download (3MB) | Preview


With the advent of high-throughput technologies, biomedical research has been dramatically reshaped in the past two decades. Technologies such as microarrays are broadly utilized to study the relationship between genomic alterations and disease outcomes. However, genomic analyses are criticized for their low reproducibility and generalizability. Large-scale meta-analysis of multiple studies is a timely and important issue with great public health significance, because robust biomarkers can be found for complex human diseases such as major depression disorder using meta-analysis techniques. Accurate marker detection will improve the disease diagnosis, treatment selection and prognosis prediction.
In this dissertation, I first illustrate different hypothesis settings for two different types of biomarkers: biomarkers that are differentially expressed (DE) “in all” studies and biomarkers that are DE “in any” studies. Then I propose a robust setting HSr to detect genes differentially expressed (DE) “in majority of” studies. For HSr, I propose an order statistic of p-values (rth order p-value, rOP) across combined studies as the test statistic. I also explore statistical properties such as power and asymptotic behavior of rOP. The method is applied to three examples to demonstrate its robustness and sensitivity. I develop two methods to guide the selection of r.
The non-complementary property of HSr causes anti-conservative inferences. To overcome this, I propose HS′r as a complementary form of HSr. For HS′r, the major obstacle comes from the mixture nature of the null distribution. From a Bayesian point of view, I propose a semiparametric mixture model for the observed p-values in combined studies. A Bayes factor is calculated based on the posterior distribution to substitute traditional hypothesis testing for HS′r. I also develop an expectation-maximization (EM) algorithm to fit this model. Simulation results and real data analysis show improved specificity and sensitivity of this novel approach compared to traditional methods.
Beyond meta-analysis of single genes, I also propose a framework to integrate multiple biological networks. A conservative subnetwork in a subset of datasets can be identified using our approach.
In conclusion, I discuss various interesting questions in genomic meta-analysis in this dissertation. And I provide a series of statistical tools to address them.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Song, Chichs108@pitt.eduCHS108
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorTseng, Georgectseng@pitt.eduCTSENG
Committee MemberFeingold, Eleanorfeingold@pitt.eduFEINGOLD
Committee MemberLu, Xinghuaxinghua@pitt.eduXINGHUA
Committee MemberWeissfeld, Lisa Alweis@pitt.eduLWEIS
Date: 29 June 2012
Date Type: Completion
Defense Date: 16 April 2012
Approval Date: 29 June 2012
Submission Date: 4 April 2012
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Number of Pages: 80
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Meta-analysis, Microarray, Genomics
Date Deposited: 29 Jun 2012 21:20
Last Modified: 29 Jun 2017 05:15

Available Versions of this Item

  • Hypothesis settings and methods for gene expression meta-analysis. (deposited 29 Jun 2012 21:20) [Currently Displayed]


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item