Liao, Ge
(2014)
Genome-wide power calculation and experimental design in RNA-Seq experiment.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Next Generation Sequencing (NGS) technology is emerging as an appealing tool in characterizing genomic profiles of target population. However, the high sequencing expense and bioinformatic complexity will continue to be obstacles for many biomedical projects in the foreseeable future. Modelling of NGS data not only involves sample size and genome-wide power inference, but also includes consideration of sequencing depth and count data property. Given total budget and pre-specified cost parameters such as unit sequencing and sample collection, researchers usually seek for a two-dimensional optimal decision.
In this dissertation, I will introduce a novel method SeqDEsign, which is developed to predict genome-wide power (EDR) of detecting differential expression (DE) genes in RNASeq experiment under targeted sample size (N’) and read depth (R’) given a pilot data (N,R). We aimed at providing advice for researchers regarding the design of RNA-Seq experiment with a limited budget.
The first part of this dissertation is about predicting genome-wide power at N’ with R being fixed. The pipeline started with hypothesis test for differential expressed gene detection based on Wald test and negative binomial assumption. We proposed ways to directly model p-value distribution by both parametric and semi-parametric mixture model. To predict the genome-wide power of DE gene detection at N, posterior approaches based on either parametric or non-parametric model were implemented.
In the second part, we discussed ways to extend power prediction to N’ and R’ simultaneously. Both nested down-sampling (NDS) scheme and model-based (MB) method were proposed and compared. The three-dimensional EDR surface (Pow(N’,R’)) was constructed by two-way inverse power law model.
Finally, we discussed the cost-benefit analysis of RNA-Seq experiment with specification of a cost function. We also explored answers to other practical questions for experimental design. This framework was illustrated in both simulations and a real data application of rat RNA-Seq data.
The public health relevance of this work lies in the development of a novel methodology for genome-wide power calculation of RNA-Seq experiment. By accurately predicting
genome-wide power, researchers can detect more biologically meaningful bio-markers, which will promote better understanding of human disease.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
29 September 2014 |
Date Type: |
Publication |
Defense Date: |
27 June 2014 |
Approval Date: |
29 September 2014 |
Submission Date: |
10 July 2014 |
Access Restriction: |
3 year -- Restrict access to University of Pittsburgh for a period of 3 years. |
Number of Pages: |
117 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Next Generation Sequencing, RNA-Seq data, Power calculation, Sample size, Mixture model, Cost-benefit analysis, Experiment design |
Date Deposited: |
29 Sep 2014 21:01 |
Last Modified: |
01 Sep 2017 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/22281 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |