Wong, An-kwok Ian
(2016)
Biomarker Discovery in Exome Data.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Current DNA sequencing technology enables inexpensive sequencing of the exome or the protein-coding regions of the genome. The primary goal of the analyses of exome data is to identify sequence variants, such as single nucleotide variations (SNVs), that will help elucidate the genetic causes of common polygenic diseases such as Alzheimer's disease and chronic pancreatitis. Exome data analysis presents several challenges. These challenges include the large number of SNVs compared to the relatively small sample size, the rarity of many of the SNVs, and potential interactions among SNVs on their effect on disease.
In this work, I develop, implement, and evaluate a new multivariate biomarker ranking algorithm called Bayesian averaged probabilistic rules (BAPR) that has several novel characteristics. It (1) learns probabilistic rule models from data, (2) performs Bayesian model averaging to rank biomarkers like SNVs, and (3) incorporates biological knowledge as structure priors of biomarkers. The BAPR algorithm was evaluated on several exome datasets with both synthetic outcomes and real outcomes, and using a range of variant deleteriousness scores as structure priors. The quality of SNV rankings was evaluated with biomarker recovery plots, area under the Receiver Operating Characteristic curves, and evidence of biological validity as supported by the literature.
The BAPR algorithm performed statistically significantly better in identifying previously known disease-associated SNVs and biologically meaningful SNVs when compared to chi-square and random forests. BAPR with uniform and expected number of predictors priors performed better than priors that were derived from variant deleteriousness scores. Also, combining several variant deleteriousness scores performed at least as well as the best performing single deleteriousness score. The variant deleteriousness scores have sparse coverage and typically scores are available only for a small proportion of SNVs that are measured in an exome dataset. The encouraging results obtained with these scores suggests that as coverage of the scores increases the performance of algorithms like BAPR that incorporate them will also improve.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
3 October 2016 |
Date Type: |
Publication |
Defense Date: |
27 June 2016 |
Approval Date: |
3 October 2016 |
Submission Date: |
29 June 2016 |
Access Restriction: |
5 year -- Restrict access to University of Pittsburgh for a period of 5 years. |
Number of Pages: |
113 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Intelligent Systems |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
BAPR, biomarker discovery, exome, exome data, biomarker recovery |
Additional Information: |
Final draft |
Date Deposited: |
03 Oct 2016 20:34 |
Last Modified: |
03 Oct 2021 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/28415 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |