Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Genome-wide power calculation and experimental design in RNA-Seq experiment

Liao, Ge (2014) Genome-wide power calculation and experimental design in RNA-Seq experiment. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Submitted Version

Download (4MB) | Preview


Next Generation Sequencing (NGS) technology is emerging as an appealing tool in characterizing genomic profiles of target population. However, the high sequencing expense and bioinformatic complexity will continue to be obstacles for many biomedical projects in the foreseeable future. Modelling of NGS data not only involves sample size and genome-wide power inference, but also includes consideration of sequencing depth and count data property. Given total budget and pre-specified cost parameters such as unit sequencing and sample collection, researchers usually seek for a two-dimensional optimal decision.
In this dissertation, I will introduce a novel method SeqDEsign, which is developed to predict genome-wide power (EDR) of detecting differential expression (DE) genes in RNASeq experiment under targeted sample size (N’) and read depth (R’) given a pilot data (N,R). We aimed at providing advice for researchers regarding the design of RNA-Seq experiment with a limited budget.
The first part of this dissertation is about predicting genome-wide power at N’ with R being fixed. The pipeline started with hypothesis test for differential expressed gene detection based on Wald test and negative binomial assumption. We proposed ways to directly model p-value distribution by both parametric and semi-parametric mixture model. To predict the genome-wide power of DE gene detection at N, posterior approaches based on either parametric or non-parametric model were implemented.
In the second part, we discussed ways to extend power prediction to N’ and R’ simultaneously. Both nested down-sampling (NDS) scheme and model-based (MB) method were proposed and compared. The three-dimensional EDR surface (Pow(N’,R’)) was constructed by two-way inverse power law model.
Finally, we discussed the cost-benefit analysis of RNA-Seq experiment with specification of a cost function. We also explored answers to other practical questions for experimental design. This framework was illustrated in both simulations and a real data application of rat RNA-Seq data.
The public health relevance of this work lies in the development of a novel methodology for genome-wide power calculation of RNA-Seq experiment. By accurately predicting
genome-wide power, researchers can detect more biologically meaningful bio-markers, which will promote better understanding of human disease.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTseng, George C.ctseng@pitt.eduCTSENG
Committee MemberOesterreich, Steffisto16@pitt.eduSTO16
Committee MemberFeingold, Eleanorfeingold@pitt.eduFEINGOLD
Committee MemberPark, Yong Seokyongpark@pitt.eduYONGPARK
Committee MemberLin, Yanyal14@pitt.eduYAL14
Date: 29 September 2014
Date Type: Publication
Defense Date: 27 June 2014
Approval Date: 29 September 2014
Submission Date: 10 July 2014
Access Restriction: 3 year -- Restrict access to University of Pittsburgh for a period of 3 years.
Number of Pages: 117
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Next Generation Sequencing, RNA-Seq data, Power calculation, Sample size, Mixture model, Cost-benefit analysis, Experiment design
Date Deposited: 29 Sep 2014 21:01
Last Modified: 01 Sep 2017 05:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item