ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES

de Jongh, Martijn (2014) ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Preview

PDF
Primary Text
Download (14MB) | Preview

Abstract

Bayesian networks (BNs) are highly practical and successful tools for modeling probabilistic knowledge. They can be constructed by an expert, learned from data, or by a combination of the two. A popular approach to learning the structure of a BN is the constraint-based search (CBS) approach, with the PC algorithm being a prominent example. In recent years, we have been experiencing a data deluge. We have access to more data, big and small, than ever before. The exponential nature of BN algorithms, however, hinders large-scale analysis. Developments in parallel and distributed computing have made the computational power required for large-scale data processing widely available, yielding opportunities for developing parallel and distributed algorithms for BN learning and inference. In this dissertation, (1) I propose two MapReduce versions of the PC algorithm, aimed at solving an increasingly common case: data is not necessarily massive in the number of records, but more and more so in the number of variables. (2) When the number of data records is small, the PC algorithm experiences problems in independence testing. Empirically, I explore a contradiction in the literature on how to resolve the case of having insufficient data when testing the independence of two variables: declare independence or dependence. (3) When BNs learned from data become complex in terms of graph density, they may require more parameters than we can feasibly store. I propose and evaluate five approaches to pruning a BN structure to guarantee that it will be tractable for storage and inference. I follow this up by proposing three approaches to improving the classification accuracy of a BN by modifying its structure.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
de Jongh, Martijn	m.a.dejongh@gmail.com

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Drudzel, Marek J.	marek@sis.pitt.edu	DRUZDZEL
Committee Member	Hirtle, Stephen C.	hirtle@pitt.edu	HIRTLE
Committee Member	Karimi, Hassan A.	hkarimi@pitt.edu	HKARIMI
Committee Member	Cooper, Gregory F	gfc@pitt.edu	GFC
Committee Member	Dash, Denver H.	denver.dash@gmail.com

Date:

12 June 2014

Date Type:

Publication

Defense Date:

22 April 2014

Approval Date:

12 June 2014

Submission Date:

4 June 2014

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Number of Pages:

196

Institution:

University of Pittsburgh

Schools and Programs:

School of Information Sciences > Information Science

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Information Science, Computer Science, Artificial Intelligence, Bayesian Networks, Map Reduce, and Distributed Computing

Date Deposited:

12 Jun 2014 19:54

Last Modified:

15 Nov 2016 14:20

URI:

http://d-scholarship.pitt.edu/id/eprint/21786

Available Versions of this Item

ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES. (deposited 12 Jun 2014 19:54) [Currently Displayed]

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES

Abstract

Share

Details

Available Versions of this Item

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds