Power calculation and study design in RNA-Seq and Methyl-Seq

Lin, Chien-Wei (2017) Power calculation and study design in RNA-Seq and Methyl-Seq. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Submitted Version
Download (2MB) | Preview

Abstract

Next generation sequencing (NGS) technology has emerged as a powerful tool in characterizing genomic profiles. Among several applications, RNA sequencing (RNA-Seq) and Methylation sequencing (Methyl-Seq) have gradually become standard tools for transcriptomic and epigenetic monitoring respectively. Although the costs of NGS experiments have constantly decreased, high sequencing cost and bioinformatic complexity remain obstacles for many biomedical projects. Unlike earlier microarray technologies, modeling of NGS data should consider discrete count data. In addition to sample size, sequencing depth is also directly related to experimental costs. Consequently, given a total budget and a pre-specified unit experimental cost, the study design issue in RNA-Seq/Methyl-Seq is a multi-dimensional constrained optimization problem rather than a one-dimensional sample size calculation in a traditional hypothesis setting. In the first part of this dissertation, we proposed a statistical framework, namely ``RNASeqDesign", to utilize pilot data for power calculation and study design of RNA-Seq experiments. The approach was based on a mixture model fitting of the p-value distribution from pilot data and a parametric bootstrap procedure to infer genome-wide power for optimal sample size and sequencing depth. We further illustrated five practical study design tasks for practitioners. We performed simulations and real data applications to evaluate performance and compare to existing methods.

In the second part, we proposed another statistical framework, namely ``MethylSeqDesign", specifically for Methyl-Seq data. There were mainly two challenges. Firstly, the statistical modeling for Methyl-Seq data required a powerful statistical test using beta-binomial model for conducting power calculation. Secondly, there is an extremely high number of CpG sites (about 30M) in the human genome, which results in many CpG sites with very shallow coverage. Hence, we focused on a region-/capture-based method which produced more counts in a region/window such that power calculation became feasible.

Public health significance: As sequencing costs keep dropping, RNA-Seq and Methyl-Seq experiments will become more prevalent and more projects with large sample size will be expected. We believe our work will provide practical guidance for future study design to understand disease mechanism and improve disease diagnosis and treatment.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Lin, Chien-Wei	chl169@pitt.edu	chl169

ETD Committee:

Title	Member	Email Address	ORCID
Committee Chair	Tseng, George	ctseng@pitt.edu
Committee Member	Park, Yongseok	yongpark@pitt.edu
Committee Member	Weeks, Daniel	weeks@pitt.edu	0000-0001-9410-7228
Committee Member	Krafty, Robert	rkrafty@pitt.edu

Date:

29 June 2017

Date Type:

Publication

Defense Date:

14 April 2017

Approval Date:

29 June 2017

Submission Date:

4 March 2017

Access Restriction:

5 year -- Restrict access to University of Pittsburgh for a period of 5 years.

Number of Pages:

Institution:

University of Pittsburgh

Schools and Programs:

School of Public Health > Biostatistics

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Power calculation, Sample size, RNA-Seq data, Methyl-Seq data, Next Generation Sequencing (NGS), p-value mixture model

Date Deposited:

29 Jun 2017 23:44

Last Modified:

30 Jun 2022 15:22

URI:

http://d-scholarship.pitt.edu/id/eprint/30934

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Power calculation and study design in RNA-Seq and Methyl-Seq

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds