Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form


Pakdaman Naeini, Mahdi (2017) OBTAINING ACCURATE PROBABILITIES USING CLASSIFIER CALIBRATION. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Download (1MB) | Preview


Learning probabilistic classification and prediction models that generate accurate probabilities is essential in many prediction and decision-making tasks in machine learning and data mining. One way to achieve this goal is to post-process the output of classification models to obtain more accurate probabilities. These post-processing methods are often referred to as calibration methods in the machine learning literature.

This thesis describes a suite of parametric and non-parametric methods for calibrating the output of classification and prediction models. In order to evaluate the calibration performance of a classifier, we introduce two new calibration measures that are intuitive statistics of the calibration
curves. We present extensive experimental results on both simulated and real datasets to evaluate the performance of the proposed methods compared with commonly used calibration methods in the literature. In particular, in terms of binary classifier calibration, our experimental results
show that the proposed methods are able to improve the calibration power of classifiers while retaining their discrimination performance. Our theoretical findings show that by using a simple non-parametric calibration method, it is possible to improve the calibration performance of a classifier
without sacrificing discrimination capability. The methods are also computationally tractable for large-scale datasets as they run in O(N log N) time, where N is the number of samples.

In this thesis we also introduce a novel framework to derive calibrated probabilities of causal relationships from observational data. The framework consists of three main components: (1) an approximate method for generating initial probability estimates of the edge types for each pair
of variables, (2) the availability of a relatively small number of the causal relationships in the network for which the truth status is known, which we call a calibration training set, and (3) a calibration method for using the approximate probability estimates and the calibration training set
to generate calibrated probabilities for the many remaining pairs of variables. Our experiments on a range of simulated data support that the proposed approach improves the calibration of edge predictions. The results also support that the approach often improves the precision and recall of those predictions.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Pakdaman Naeini, Mahdimap218@pitt.edumap218
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairCooper, Gregory
Committee MemberMilos,
Committee MemberVisweswaran,
Committee MemberSchneider,
Date: 27 January 2017
Date Type: Publication
Defense Date: 5 August 2016
Approval Date: 27 January 2017
Submission Date: 28 November 2016
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 150
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Intelligent Systems
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Classifier calibration, causality detection, Bayesian binning into Quantile(BBQ), Ensemble of Linear Trend Estimation(ELiTE), Ensemble of Near Isotonic Regression (ENIR)
Date Deposited: 27 Jan 2017 17:00
Last Modified: 28 Jan 2017 06:15

Available Versions of this Item

  • OBTAINING ACCURATE PROBABILITIES USING CLASSIFIER CALIBRATION. (deposited 27 Jan 2017 17:00) [Currently Displayed]


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item