Amizadeh, Saeed
(2014)
NON-PARAMETRIC GRAPH-BASED METHODS FOR LARGE SCALE PROBLEMS.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
The notion of similarity between observations plays a very fundamental role in many Machine Learning and Data Mining algorithms. In
many of these methods, the fundamental problem of prediction, which is making assessments and/or inferences about the future observations from the
past ones, boils down to how ``similar'' the future cases are to the already observed ones. However, similarity is not always
obtained through the traditional distance metrics. Data-driven similarity metrics, in particular, come into play where the traditional absolute
metrics are not sufficient for the task in hand due to special structure of the observed data. A common approach for computing data-driven similarity
is to somehow
aggregate the local absolute similarities (which are not data-driven and can be computed in a closed-from) to infer a global
data-driven similarity value between any pair of observations. The graph-based methods offer a natural framework to do so.
Incorporating these methods, many of the Machine Learning algorithms, that are
designed to work with absolute distances, can be applied on those problems with data-driven distances. This makes graph-based methods
very effective tools for many real-world problems.
In this thesis, the major problem that I want to address is the scalability of the graph-based methods. With the rise of large-scale,
high-dimensional datasets in many real-world applications, many Machine Learning algorithms do not scale up well applying to these problems.
The graph-based methods are no exception
either. Both the large number of observations and the high dimensionality hurt graph-based methods,
computationally and statistically. While the large number of observations imposes more of a computational problem, the high dimensionality
problem has more of a statistical nature. In this thesis, I address both of these issues in depth and review the common solutions for them proposed
in the literature. Moreover, for each of these problems, I propose novel solutions with experimental results depicting the merits of the proposed
algorithms. Finally, I discuss the contribution of the proposed work from a broader viewpoint and draw some future directions of the current work.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
28 January 2014 |
Date Type: |
Publication |
Defense Date: |
28 August 2013 |
Approval Date: |
28 January 2014 |
Submission Date: |
29 October 2013 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
186 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Intelligent Systems |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Machine Learning, Data-Driven Similarity, Graph-based Methods, Spectral Graph-based Methods, Non-parametric Methods, Large-Scale Problems,
N-body Problems, Curse of Dimensionality, Label Propagation, Semi-supervised Learning, Spectral Clustering |
Date Deposited: |
28 Jan 2014 16:25 |
Last Modified: |
15 Nov 2016 14:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/19949 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |