de Jongh, Martijn
(2014)
ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
This is the latest version of this item.
Abstract
Bayesian networks (BNs) are highly practical and successful tools for modeling probabilistic knowledge. They can be constructed by an expert, learned from data, or by a combination of the two. A popular approach to learning the structure of a BN is the constraint-based search (CBS) approach, with the PC algorithm being a prominent example. In recent years, we have been experiencing a data deluge. We have access to more data, big and small, than ever before. The exponential nature of BN algorithms, however, hinders large-scale analysis. Developments in parallel and distributed computing have made the computational power required for large-scale data processing widely available, yielding opportunities for developing parallel and distributed algorithms for BN learning and inference. In this dissertation, (1) I propose two MapReduce versions of the PC algorithm, aimed at solving an increasingly common case: data is not necessarily massive in the number of records, but more and more so in the number of variables. (2) When the number of data records is small, the PC algorithm experiences problems in independence testing. Empirically, I explore a contradiction in the literature on how to resolve the case of having insufficient data when testing the independence of two variables: declare independence or dependence. (3) When BNs learned from data become complex in terms of graph density, they may require more parameters than we can feasibly store. I propose and evaluate five approaches to pruning a BN structure to guarantee that it will be tractable for storage and inference. I follow this up by proposing three approaches to improving the classification accuracy of a BN by modifying its structure.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
12 June 2014 |
Date Type: |
Publication |
Defense Date: |
22 April 2014 |
Approval Date: |
12 June 2014 |
Submission Date: |
4 June 2014 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
196 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Information Sciences > Information Science |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Information Science, Computer Science, Artificial Intelligence, Bayesian Networks, Map Reduce, and Distributed Computing |
Date Deposited: |
12 Jun 2014 19:54 |
Last Modified: |
15 Nov 2016 14:20 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/21786 |
Available Versions of this Item
-
ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES. (deposited 12 Jun 2014 19:54)
[Currently Displayed]
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |