Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES

de Jongh, Martijn (2014) ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

[img]
Preview
PDF
Primary Text

Download (14MB) | Preview

Abstract

Bayesian networks (BNs) are highly practical and successful tools for modeling probabilistic knowledge. They can be constructed by an expert, learned from data, or by a combination of the two. A popular approach to learning the structure of a BN is the constraint-based search (CBS) approach, with the PC algorithm being a prominent example. In recent years, we have been experiencing a data deluge. We have access to more data, big and small, than ever before. The exponential nature of BN algorithms, however, hinders large-scale analysis. Developments in parallel and distributed computing have made the computational power required for large-scale data processing widely available, yielding opportunities for developing parallel and distributed algorithms for BN learning and inference. In this dissertation, (1) I propose two MapReduce versions of the PC algorithm, aimed at solving an increasingly common case: data is not necessarily massive in the number of records, but more and more so in the number of variables. (2) When the number of data records is small, the PC algorithm experiences problems in independence testing. Empirically, I explore a contradiction in the literature on how to resolve the case of having insufficient data when testing the independence of two variables: declare independence or dependence. (3) When BNs learned from data become complex in terms of graph density, they may require more parameters than we can feasibly store. I propose and evaluate five approaches to pruning a BN structure to guarantee that it will be tractable for storage and inference. I follow this up by proposing three approaches to improving the classification accuracy of a BN by modifying its structure.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
de Jongh, Martijnm.a.dejongh@gmail.com
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairDrudzel, Marek J.marek@sis.pitt.eduDRUZDZEL
Committee MemberHirtle, Stephen C.hirtle@pitt.eduHIRTLE
Committee MemberKarimi, Hassan A.hkarimi@pitt.eduHKARIMI
Committee MemberCooper, Gregory Fgfc@pitt.eduGFC
Committee MemberDash, Denver H.denver.dash@gmail.com
Date: 12 June 2014
Date Type: Publication
Defense Date: 22 April 2014
Approval Date: 12 June 2014
Submission Date: 4 June 2014
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 196
Institution: University of Pittsburgh
Schools and Programs: School of Information Sciences > Information Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Information Science, Computer Science, Artificial Intelligence, Bayesian Networks, Map Reduce, and Distributed Computing
Date Deposited: 12 Jun 2014 19:54
Last Modified: 15 Nov 2016 14:20
URI: http://d-scholarship.pitt.edu/id/eprint/21786

Available Versions of this Item

  • ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES. (deposited 12 Jun 2014 19:54) [Currently Displayed]

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item