Artifact of Detecting Biomarkers Associated with Sequencing Depth in RNA-Seq

Yin, RuoFei (2023) Artifact of Detecting Biomarkers Associated with Sequencing Depth in RNA-Seq. Master's Thesis, University of Pittsburgh. (Unpublished)

Preview

PDF
Download (542kB) | Preview

Abstract

RNA-Seq is a highly sensitive and accurate sequencing technique that uses next-generation sequencing (NGS) technology to reveal the presence and quantity of RNA in a biological sample at a given moment, which is useful for studying the behavior of genes under different biological conditions.[1,2] An essential step in an RNA-Seq study is normalization, in which raw data are adjusted to account for systematic technical biases such as library size and transcript length.[3] Multiple popular normalization methods have been proposed and widely used, including counts per million (CPM), transcripts per million (TPM) and reads per kilobase million (RPKM). Although systematic experimental bias and technical variation are expected to be eliminated after normalization, we surprisingly found a large proportion of genes associated with library size in human post-mortem striatum normalized RNA-seq data. In this thesis, we confirmed the universal existence of this problem by systematically examining 159 Gene Expression Omnibus (GEO) datasets and 24 of The Cancer Genome Atlas (TCGA) datasets. We conducted a simulation study to rule out potential causes from count data quantification and examined a potential solution to correct the artifact based on a Poisson model with variable rates for different nucleotide patterns from a previous publication. We reproduced the results of this paper and applied this published model to these data to see if the library size affected the regression. We performed linear regression analysis on the model coefficients and library size, which did not show evidence of an association. Thus, for a future direction, we plan to replace this Poisson model with a negative binomial model which may improve the model fitting and develop as a solution to correct the artifact. If successful, the new normalization will improve association analysis and biomarker detection in basic and clinical studies of diseases.
Public health significance: Limited number of research has been focused on the artifact of the biomarkers associated with sequencing depth in normalized RNA-Seq datasets, which should be corrected to improve accuracy in downstream translation research. This paper tries to figure out this artifact.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Yin, RuoFei	RUY28@pitt.edu	RUY28	0009-0005-4117-380X

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Tseng, George C	ctseng@pitt.edu	ctseng
Committee Member	Carlson, Jenna Colavincenzo	jnc35@pitt.edu	jnc35
Committee Member	Fan, Kang-Hsien	frank.fan@pitt.edu	frank.fan

Date:

11 May 2023

Date Type:

Publication

Defense Date:

21 May 2023

Approval Date:

11 May 2023

Submission Date:

27 April 2023

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Number of Pages:

Institution:

University of Pittsburgh

Schools and Programs:

School of Public Health > Biostatistics

Degree:

MS - Master of Science

Thesis Type:

Master's Thesis

Refereed:

Yes

Uncontrolled Keywords:

RNA-Seq; Normalization; Sequencing depth

Related URLs:

Modeling non-uniformity in short-read rates in RNA-Seq data

Date Deposited:

11 May 2023 16:55

Last Modified:

11 May 2023 16:55

URI:

http://d-scholarship.pitt.edu/id/eprint/44782

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Artifact of Detecting Biomarkers Associated with Sequencing Depth in RNA-Seq

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds