Biomarker Discovery in Exome Data

Wong, An-kwok Ian (2016) Biomarker Discovery in Exome Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Download (12MB) | Preview

Abstract

Current DNA sequencing technology enables inexpensive sequencing of the exome or the protein-coding regions of the genome. The primary goal of the analyses of exome data is to identify sequence variants, such as single nucleotide variations (SNVs), that will help elucidate the genetic causes of common polygenic diseases such as Alzheimer's disease and chronic pancreatitis. Exome data analysis presents several challenges. These challenges include the large number of SNVs compared to the relatively small sample size, the rarity of many of the SNVs, and potential interactions among SNVs on their effect on disease.

In this work, I develop, implement, and evaluate a new multivariate biomarker ranking algorithm called Bayesian averaged probabilistic rules (BAPR) that has several novel characteristics. It (1) learns probabilistic rule models from data, (2) performs Bayesian model averaging to rank biomarkers like SNVs, and (3) incorporates biological knowledge as structure priors of biomarkers. The BAPR algorithm was evaluated on several exome datasets with both synthetic outcomes and real outcomes, and using a range of variant deleteriousness scores as structure priors. The quality of SNV rankings was evaluated with biomarker recovery plots, area under the Receiver Operating Characteristic curves, and evidence of biological validity as supported by the literature.

The BAPR algorithm performed statistically significantly better in identifying previously known disease-associated SNVs and biologically meaningful SNVs when compared to chi-square and random forests. BAPR with uniform and expected number of predictors priors performed better than priors that were derived from variant deleteriousness scores. Also, combining several variant deleteriousness scores performed at least as well as the best performing single deleteriousness score. The variant deleteriousness scores have sparse coverage and typically scores are available only for a small proportion of SNVs that are measured in an exome dataset. The encouraging results obtained with these scores suggests that as coverage of the scores increases the performance of algorithms like BAPR that incorporate them will also improve.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Wong, An-kwok Ian	ian@aiwong.com	AIW5

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Visweswaran, Shyam	shv3@pitt.edu	SHV3
Committee Member	Cooper, Gregory F	gfc@cbmi.pitt.edu	GFC
Committee Member	Hauskrecht, Milos	milos@cs.pitt.edu	MILOS
Committee Member	Barmada, M. Michael	barmada@pitt.edu	BARMADA

Date:

3 October 2016

Date Type:

Publication

Defense Date:

27 June 2016

Approval Date:

3 October 2016

Submission Date:

29 June 2016

Access Restriction:

5 year -- Restrict access to University of Pittsburgh for a period of 5 years.

Number of Pages:

113

Institution:

University of Pittsburgh

Schools and Programs:

Dietrich School of Arts and Sciences > Intelligent Systems

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

BAPR, biomarker discovery, exome, exome data, biomarker recovery

Additional Information:

Final draft

Date Deposited:

03 Oct 2016 20:34

Last Modified:

03 Oct 2021 05:15

URI:

http://d-scholarship.pitt.edu/id/eprint/28415

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Biomarker Discovery in Exome Data

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds