Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

A Bayesian Rule Generation Framework for 'Omic' Biomedical Data Analysis

Lustgarten, Jonathan Llyle (2009) A Bayesian Rule Generation Framework for 'Omic' Biomedical Data Analysis. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (2MB) | Preview


High-dimensional biomedical 'omic' datasets are accumulating rapidly from studies aimed at early detection and better management of human disease. These datasets pose tremendous challenges for analysis due to their large number of variables that represent measurements of biochemical molecules, such as proteins and mRNA, from bodily fluids or tissues extracted from a rather small cohort of samples. Machine learning methods have been applied to modeling these datasets including rule learning methods, which have been successful in generating models that are easily interpretable by the scientists. Rule learning methods have typically relied on a frequentist measure of certainty within IF-THEN (propositional) rules. In this dissertation, a Bayesian Rule Generation Framework (BRGF) is developed and tested that can produce rules with probabilities, thereby enabling a mathematically rigorous representation of uncertainty in rule models. The BRGF includes a novel Bayesian Discretization method combined with one or more search strategies for building constrained Bayesian Networks from data and converting them into probabilistic rules. Both global and local structures are built using different Bayesian Network generation algorithms and the rule models generated from the network are tested on public and private 'omic' datasets. We show that using a specific type of structure (Bayesian decision graphs) in tandem with a specific type of search method (parallel greedy) allows us to achieve statistically significant higher overall performance over current state of the art rule learning methods. Not only does using the BRGF boost performance on average on 'omic' biomedical data to a statistically significant point, but also provides the ability to incorporate prior information in a mathematically rigorous fashion for modeling purposes.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Lustgarten, Jonathan Llylejll47@pitt.eduJLL47
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairGopalakrishnan, Vanathivanathi@pitt.eduVANATHI
Committee MemberBowser, Robert
Committee MemberVisweswaran, Shyamshv3@pitt.eduSHV3
Committee MemberHogan, William
Date: 14 May 2009
Date Type: Completion
Defense Date: 9 April 2009
Approval Date: 14 May 2009
Submission Date: 6 March 2009
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Biomedical Informatics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Bayesian Networks; Biomedical Data; Genomics; Probabilistic Rules; Proteomics; Rule Learning
Other ID:, etd-03062009-175216
Date Deposited: 10 Nov 2011 19:32
Last Modified: 15 Nov 2016 13:36


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item