Pakdaman Naeini, Mahdi
(2017)
OBTAINING ACCURATE PROBABILITIES USING CLASSIFIER CALIBRATION.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
This is the latest version of this item.
Abstract
Learning probabilistic classification and prediction models that generate accurate probabilities is essential in many prediction and decision-making tasks in machine learning and data mining. One way to achieve this goal is to post-process the output of classification models to obtain more accurate probabilities. These post-processing methods are often referred to as calibration methods in the machine learning literature.
This thesis describes a suite of parametric and non-parametric methods for calibrating the output of classification and prediction models. In order to evaluate the calibration performance of a classifier, we introduce two new calibration measures that are intuitive statistics of the calibration
curves. We present extensive experimental results on both simulated and real datasets to evaluate the performance of the proposed methods compared with commonly used calibration methods in the literature. In particular, in terms of binary classifier calibration, our experimental results
show that the proposed methods are able to improve the calibration power of classifiers while retaining their discrimination performance. Our theoretical findings show that by using a simple non-parametric calibration method, it is possible to improve the calibration performance of a classifier
without sacrificing discrimination capability. The methods are also computationally tractable for large-scale datasets as they run in O(N log N) time, where N is the number of samples.
In this thesis we also introduce a novel framework to derive calibrated probabilities of causal relationships from observational data. The framework consists of three main components: (1) an approximate method for generating initial probability estimates of the edge types for each pair
of variables, (2) the availability of a relatively small number of the causal relationships in the network for which the truth status is known, which we call a calibration training set, and (3) a calibration method for using the approximate probability estimates and the calibration training set
to generate calibrated probabilities for the many remaining pairs of variables. Our experiments on a range of simulated data support that the proposed approach improves the calibration of edge predictions. The results also support that the approach often improves the precision and recall of those predictions.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
Creators | Email | Pitt Username | ORCID |
---|
Pakdaman Naeini, Mahdi | map218@pitt.edu | map218 | |
|
ETD Committee: |
|
Date: |
27 January 2017 |
Date Type: |
Publication |
Defense Date: |
5 August 2016 |
Approval Date: |
27 January 2017 |
Submission Date: |
28 November 2016 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
150 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Intelligent Systems |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Classifier calibration, causality detection, Bayesian binning into Quantile(BBQ), Ensemble of Linear Trend Estimation(ELiTE), Ensemble of Near Isotonic Regression (ENIR) |
Date Deposited: |
27 Jan 2017 17:00 |
Last Modified: |
28 Jan 2017 06:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/30526 |
Available Versions of this Item
-
OBTAINING ACCURATE PROBABILITIES USING CLASSIFIER CALIBRATION. (deposited 27 Jan 2017 17:00)
[Currently Displayed]
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |