Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Biomarker Discovery in Exome Data

Wong, An-kwok Ian (2016) Biomarker Discovery in Exome Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (12MB) | Preview


Current DNA sequencing technology enables inexpensive sequencing of the exome or the protein-coding regions of the genome. The primary goal of the analyses of exome data is to identify sequence variants, such as single nucleotide variations (SNVs), that will help elucidate the genetic causes of common polygenic diseases such as Alzheimer's disease and chronic pancreatitis. Exome data analysis presents several challenges. These challenges include the large number of SNVs compared to the relatively small sample size, the rarity of many of the SNVs, and potential interactions among SNVs on their effect on disease.

In this work, I develop, implement, and evaluate a new multivariate biomarker ranking algorithm called Bayesian averaged probabilistic rules (BAPR) that has several novel characteristics. It (1) learns probabilistic rule models from data, (2) performs Bayesian model averaging to rank biomarkers like SNVs, and (3) incorporates biological knowledge as structure priors of biomarkers. The BAPR algorithm was evaluated on several exome datasets with both synthetic outcomes and real outcomes, and using a range of variant deleteriousness scores as structure priors. The quality of SNV rankings was evaluated with biomarker recovery plots, area under the Receiver Operating Characteristic curves, and evidence of biological validity as supported by the literature.

The BAPR algorithm performed statistically significantly better in identifying previously known disease-associated SNVs and biologically meaningful SNVs when compared to chi-square and random forests. BAPR with uniform and expected number of predictors priors performed better than priors that were derived from variant deleteriousness scores. Also, combining several variant deleteriousness scores performed at least as well as the best performing single deleteriousness score. The variant deleteriousness scores have sparse coverage and typically scores are available only for a small proportion of SNVs that are measured in an exome dataset. The encouraging results obtained with these scores suggests that as coverage of the scores increases the performance of algorithms like BAPR that incorporate them will also improve.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Wong, An-kwok Ianian@aiwong.comAIW5
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairVisweswaran, Shyamshv3@pitt.eduSHV3
Committee MemberCooper, Gregory Fgfc@cbmi.pitt.eduGFC
Committee MemberHauskrecht, Milosmilos@cs.pitt.eduMILOS
Committee MemberBarmada, M. Michaelbarmada@pitt.eduBARMADA
Date: 3 October 2016
Date Type: Publication
Defense Date: 27 June 2016
Approval Date: 3 October 2016
Submission Date: 29 June 2016
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Number of Pages: 113
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Intelligent Systems
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: BAPR, biomarker discovery, exome, exome data, biomarker recovery
Additional Information: Final draft
Date Deposited: 03 Oct 2016 20:34
Last Modified: 03 Oct 2021 05:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item