Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Application of an efficient Bayesian discretization method to biomedical data

Lustgarten, JL and Visweswaran, S and Gopalakrishnan, V and Cooper, GF (2011) Application of an efficient Bayesian discretization method to biomedical data. BMC Bioinformatics, 12.

[img]
Preview
PDF
Published Version
Available under License : See the attached license file.

Download (696kB) | Preview
[img] Plain Text (licence)
Available under License : See the attached license file.

Download (1kB)

Abstract

Background: Several data mining methods require data that are discrete, and other methods often perform better with discrete data. We introduce an efficient Bayesian discretization (EBD) method for optimal discretization of variables that runs efficiently on high-dimensional biomedical datasets. The EBD method consists of two components, namely, a Bayesian score to evaluate discretizations and a dynamic programming search procedure to efficiently search the space of possible discretizations. We compared the performance of EBD to Fayyad and Irani's (FI) discretization method, which is commonly used for discretization.Results: On 24 biomedical datasets obtained from high-throughput transcriptomic and proteomic studies, the classification performances of the C4.5 classifier and the naïve Bayes classifier were statistically significantly better when the predictor variables were discretized using EBD over FI. EBD was statistically significantly more stable to the variability of the datasets than FI. However, EBD was less robust, though not statistically significantly so, than FI and produced slightly more complex discretizations than FI.Conclusions: On a range of biomedical datasets, a Bayesian discretization method (EBD) yielded better classification performance and stability but was less robust than the widely used FI discretization method. The EBD discretization method is easy to implement, permits the incorporation of prior knowledge and belief, and is sufficiently fast for application to high-dimensional data. © 2011 Lustgarten et al; licensee BioMed Central Ltd.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: Article
Status: Published
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Lustgarten, JL
Visweswaran, Sshv3@pitt.eduSHV3
Gopalakrishnan, Vvanathi@pitt.eduVANATHI
Cooper, GFgfc@pitt.eduGFC
Date: 28 July 2011
Date Type: Publication
Journal or Publication Title: BMC Bioinformatics
Volume: 12
DOI or Unique Handle: 10.1186/1471-2105-12-309
Schools and Programs: Dietrich School of Arts and Sciences > Intelligent Systems
School of Medicine > Biomedical Informatics
Refereed: Yes
Date Deposited: 31 Oct 2016 16:37
Last Modified: 04 Feb 2019 15:58
URI: http://d-scholarship.pitt.edu/id/eprint/30030

Metrics

Monthly Views for the past 3 years

Plum Analytics

Altmetric.com


Actions (login required)

View Item View Item