Lustgarten, Jonathan Llyle
(2009)
A Bayesian Rule Generation Framework for 'Omic' Biomedical Data Analysis.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
High-dimensional biomedical 'omic' datasets are accumulating rapidly from studies aimed at early detection and better management of human disease. These datasets pose tremendous challenges for analysis due to their large number of variables that represent measurements of biochemical molecules, such as proteins and mRNA, from bodily fluids or tissues extracted from a rather small cohort of samples. Machine learning methods have been applied to modeling these datasets including rule learning methods, which have been successful in generating models that are easily interpretable by the scientists. Rule learning methods have typically relied on a frequentist measure of certainty within IF-THEN (propositional) rules. In this dissertation, a Bayesian Rule Generation Framework (BRGF) is developed and tested that can produce rules with probabilities, thereby enabling a mathematically rigorous representation of uncertainty in rule models. The BRGF includes a novel Bayesian Discretization method combined with one or more search strategies for building constrained Bayesian Networks from data and converting them into probabilistic rules. Both global and local structures are built using different Bayesian Network generation algorithms and the rule models generated from the network are tested on public and private 'omic' datasets. We show that using a specific type of structure (Bayesian decision graphs) in tandem with a specific type of search method (parallel greedy) allows us to achieve statistically significant higher overall performance over current state of the art rule learning methods. Not only does using the BRGF boost performance on average on 'omic' biomedical data to a statistically significant point, but also provides the ability to incorporate prior information in a mathematically rigorous fashion for modeling purposes.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
Creators | Email | Pitt Username | ORCID |
---|
Lustgarten, Jonathan Llyle | jll47@pitt.edu | JLL47 | |
|
ETD Committee: |
|
Date: |
14 May 2009 |
Date Type: |
Completion |
Defense Date: |
9 April 2009 |
Approval Date: |
14 May 2009 |
Submission Date: |
6 March 2009 |
Access Restriction: |
5 year -- Restrict access to University of Pittsburgh for a period of 5 years. |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Medicine > Biomedical Informatics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Bayesian Networks; Biomedical Data; Genomics; Probabilistic Rules; Proteomics; Rule Learning |
Other ID: |
http://etd.library.pitt.edu/ETD/available/etd-03062009-175216/, etd-03062009-175216 |
Date Deposited: |
10 Nov 2011 19:32 |
Last Modified: |
15 Nov 2016 13:36 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/6441 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |