Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Multivariate Data Modeling and Its Applications to Conditional Outlier Detection

Hong, Charmgil (2018) Multivariate Data Modeling and Its Applications to Conditional Outlier Detection. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (6MB) | Preview

Abstract

With recent advances in data technology, large amounts of data of various kinds and from various sources are being generated and collected every second. The increase in the amounts of collected data is often accompanied by increase in the complexity of data types and objects we are able to store. The next challenge is the development of machine learning methods for their analyses. This thesis contributes to the effort by focusing on the analysis of one such data type, complex input-output data objects with high-dimensional multivariate binary output spaces, and two data-analytic problems: Multi-Label Classification and Conditional Outlier Detection.

First, we study the Multi-label Classification (MLC) problem that concerns classification of data instances into multiple binary output (class or response) variables that reflect different views, functions, or components describing the data. We present three MLC frameworks that effectively learn and predict the best output configuration for complex input-output data objects. Our experimental evaluation on a range of datasets shows that our solutions outperform several state-of-the-art MLC methods and produce more reliable posterior probability estimates.

Second, we investigate the Conditional Outlier Detection (COD) problem, where our goal is to identify unusual patterns observed in the multi-dimensional binary output space given their input context. We made two important contributions to the definition and solutions of COD. First, by observing a gap in between the development of unconditional and conditional outlier detection approaches, we propose a ratio of outlier scores (ROS) that uses a pair of unconditional scores to calculate the conditional scores. Second, we show that by applying the chain decomposition of the probabilistic model, the probabilistic multivariate COD score decomposes to a set of probabilistic univariate COD scores. This decomposition can be subsequently generalized and extended to a broad spectrum of multivariate COD scores, including the new ROS score and its variants, leading to a new multivariate conditional outlier scoring framework. Through experiments on synthetic and real-world datasets with simulated outliers, we provide empirical results that support the validity of our COD methods.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Hong, Charmgilcharmgil@cs.pitt.educhh91
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHauskrecht, Milosmilos@cs.pitt.edumilos
Committee MemberCooper, Gregorygfc@pitt.edugfc
Committee MemberHwa, Rebeccahwa@cs.pitt.edureh23
Committee MemberKovashka, Adrianakovashka@cs.pitt.eduaik85
Date: 31 January 2018
Date Type: Publication
Defense Date: 9 August 2017
Approval Date: 31 January 2018
Submission Date: 28 September 2017
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 178
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Computer Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: multi-label classification, structured prediction, classification, conditional outlier detection, outlier detection
Date Deposited: 31 Jan 2018 17:34
Last Modified: 31 Jan 2019 06:15
URI: http://d-scholarship.pitt.edu/id/eprint/33223

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item