Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form


Amizadeh, Saeed (2014) NON-PARAMETRIC GRAPH-BASED METHODS FOR LARGE SCALE PROBLEMS. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Accepted Version

Download (3MB) | Preview


The notion of similarity between observations plays a very fundamental role in many Machine Learning and Data Mining algorithms. In
many of these methods, the fundamental problem of prediction, which is making assessments and/or inferences about the future observations from the
past ones, boils down to how ``similar'' the future cases are to the already observed ones. However, similarity is not always
obtained through the traditional distance metrics. Data-driven similarity metrics, in particular, come into play where the traditional absolute
metrics are not sufficient for the task in hand due to special structure of the observed data. A common approach for computing data-driven similarity
is to somehow
aggregate the local absolute similarities (which are not data-driven and can be computed in a closed-from) to infer a global
data-driven similarity value between any pair of observations. The graph-based methods offer a natural framework to do so.
Incorporating these methods, many of the Machine Learning algorithms, that are
designed to work with absolute distances, can be applied on those problems with data-driven distances. This makes graph-based methods
very effective tools for many real-world problems.

In this thesis, the major problem that I want to address is the scalability of the graph-based methods. With the rise of large-scale,
high-dimensional datasets in many real-world applications, many Machine Learning algorithms do not scale up well applying to these problems.
The graph-based methods are no exception
either. Both the large number of observations and the high dimensionality hurt graph-based methods,
computationally and statistically. While the large number of observations imposes more of a computational problem, the high dimensionality
problem has more of a statistical nature. In this thesis, I address both of these issues in depth and review the common solutions for them proposed
in the literature. Moreover, for each of these problems, I propose novel solutions with experimental results depicting the merits of the proposed
algorithms. Finally, I discuss the contribution of the proposed work from a broader viewpoint and draw some future directions of the current work.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHauskrecht, Milosmilos@cs.pitt.eduMILOS
Committee MemberCooper, Gregory Fgfc@cbmi.pitt.eduGFC
Committee MemberDruzdzel, Marek J.marek@sis.pitt.eduDRUZDZEL
Committee MemberChennubhotla, Chakrachakracs@pitt.eduCHAKRACS
Committee MemberVisweswaran, Shyamshv3@pitt.eduSHV3
Committee MemberNugent,
Date: 28 January 2014
Date Type: Publication
Defense Date: 28 August 2013
Approval Date: 28 January 2014
Submission Date: 29 October 2013
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 186
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Intelligent Systems
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Machine Learning, Data-Driven Similarity, Graph-based Methods, Spectral Graph-based Methods, Non-parametric Methods, Large-Scale Problems, N-body Problems, Curse of Dimensionality, Label Propagation, Semi-supervised Learning, Spectral Clustering
Date Deposited: 28 Jan 2014 16:25
Last Modified: 15 Nov 2016 14:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item