Hong, Charmgil
(2018)
Multivariate Data Modeling and Its Applications to Conditional Outlier Detection.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
With recent advances in data technology, large amounts of data of various kinds and from various sources are being generated and collected every second. The increase in the amounts of collected data is often accompanied by increase in the complexity of data types and objects we are able to store. The next challenge is the development of machine learning methods for their analyses. This thesis contributes to the effort by focusing on the analysis of one such data type, complex input-output data objects with high-dimensional multivariate binary output spaces, and two data-analytic problems: Multi-Label Classification and Conditional Outlier Detection.
First, we study the Multi-label Classification (MLC) problem that concerns classification of data instances into multiple binary output (class or response) variables that reflect different views, functions, or components describing the data. We present three MLC frameworks that effectively learn and predict the best output configuration for complex input-output data objects. Our experimental evaluation on a range of datasets shows that our solutions outperform several state-of-the-art MLC methods and produce more reliable posterior probability estimates.
Second, we investigate the Conditional Outlier Detection (COD) problem, where our goal is to identify unusual patterns observed in the multi-dimensional binary output space given their input context. We made two important contributions to the definition and solutions of COD. First, by observing a gap in between the development of unconditional and conditional outlier detection approaches, we propose a ratio of outlier scores (ROS) that uses a pair of unconditional scores to calculate the conditional scores. Second, we show that by applying the chain decomposition of the probabilistic model, the probabilistic multivariate COD score decomposes to a set of probabilistic univariate COD scores. This decomposition can be subsequently generalized and extended to a broad spectrum of multivariate COD scores, including the new ROS score and its variants, leading to a new multivariate conditional outlier scoring framework. Through experiments on synthetic and real-world datasets with simulated outliers, we provide empirical results that support the validity of our COD methods.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
31 January 2018 |
Date Type: |
Publication |
Defense Date: |
9 August 2017 |
Approval Date: |
31 January 2018 |
Submission Date: |
28 September 2017 |
Access Restriction: |
1 year -- Restrict access to University of Pittsburgh for a period of 1 year. |
Number of Pages: |
178 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Computer Science |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
multi-label classification, structured prediction, classification, conditional outlier detection, outlier detection |
Date Deposited: |
31 Jan 2018 17:34 |
Last Modified: |
31 Jan 2019 06:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/33223 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |