Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

A BAYESIAN APPROACH TO LEARNING DECISION TREES FOR PATIENT-SPECIFIC MODELS

Dutta-Moscato, Joyeeta (2018) A BAYESIAN APPROACH TO LEARNING DECISION TREES FOR PATIENT-SPECIFIC MODELS. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img] PDF
Restricted to University of Pittsburgh users only until 28 August 2019.

Download (2MB) | Request a Copy

Abstract

A principal goal of precision medicine is to identify genomic factors that are predictive of outcomes in complex diseases, to provide better insight into their molecular mechanisms. Based on our current understanding, there are many genomic factors that are likely to be pathogenic in small subpopulations while being rare in the population as a whole. This research introduces a new machine learning method for discovering single nucleotide variants (SNVs), both common and rare, that in a given person are predictive of that person developing a disease or disease outcome.
The new method described in this research constructs decision tree models, uses a Bayesian score to evaluate the models, and employs a person-specific search strategy to identify SNVs that are predictive in a subpopulation whose members are similar to the person of interest. This method, called the Personalized Decision Tree Algorithm (PDTA), works by constructing a decision tree model from the data and then identifying a path in the tree that has excellent
prediction for the person of interest, or constructing a new path if none of the paths in the tree have excellent prediction.
The PDTA was refined iteratively on synthetic data and was experimentally evaluated on five datasets. One of the datasets was synthetic, one was semi-synthetic, and three were biological datasets collected from patients with chronic pancreatitis that included one small genomic dataset, a whole exome dataset, and a whole exome dataset focused on patients with diabetes in chronic pancreatitis. The performance of the method was evaluated using area under the Receiver Operating Characteristic curve and F1 score, as well as the ability to retrieve known and unknown rare SNVs. The PDTA was found to be effective to varying degrees in the datasets that were evaluated, creating parsimonious genetic representations for patient-specific groups, with the potential to discover novel variants.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Dutta-Moscato, Joyeetajod30@pitt.edujod30
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorVisweswaran, Shyamshv3@pitt.edushv3
Committee MemberBecich, Michaelbecich@pitt.edubecich
Committee MemberLu, Xinghuaxinghua@pitt.eduxinghua
Committee MemberWhitcomb, Davidwhitcomb@pitt.eduwhitcomb
Date: 28 August 2018
Date Type: Publication
Defense Date: 2 August 2018
Approval Date: 28 August 2018
Submission Date: 28 August 2018
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 134
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Biomedical Informatics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: rare variant, SNP discovery, BDeu, personalized medicine, precision medicine
Date Deposited: 28 Aug 2018 19:37
Last Modified: 28 Aug 2018 19:37
URI: http://d-scholarship.pitt.edu/id/eprint/35269

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item