Samadianzakaria, Alireza
(2021)
Relational Machine Learning Algorithms.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
The majority of learning tasks faced by data scientists involve relational data, yet most standard algorithms for standard learning problems are not designed to accept relational data as input. The standard practice to address this issue is to join the relational data to create the type of geometric input that standard learning algorithms expect. Unfortunately, this standard practice has exponential worst-case time and space complexity. This leads us to consider what we call the Relational Learning Question: "Which standard learning algorithms can be efficiently implemented on relational data, and for those that cannot, is there an alternative algorithm that can be efficiently implemented on relational data and that has similar performance guarantees to the standard algorithm?"
In this dissertation, we address the relational learning question for the well-known problems of support vector machine (SVM), logistic regression, and $k$-means clustering. First, we design an efficient relational algorithm for regularized linear SVM and logistic regression using sampling methods. We show how to implement a variation of gradient descent that provides a nearly optimal approximation guarantee for stable instances. For the $k$-means problem, we show that the $k$-means++ algorithm can be efficiently implemented on relational data, and that a slight variation of adaptive k-means algorithm can be efficiently implemented on relational data while maintaining a constant approximation guarantee. On the way to developing these algorithms, we give an efficient approximation algorithm for certain sum-product queries with additive inequalities that commonly arise.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
8 September 2021 |
Date Type: |
Publication |
Defense Date: |
7 July 2021 |
Approval Date: |
8 September 2021 |
Submission Date: |
30 July 2021 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
145 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Computing and Information > Computer Science |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Relational Algorithms, In-Database Machine Learning, K Means, Linear SVM, Logistic Regression, Functional Aggregation Queries under Additive Inequality |
Date Deposited: |
08 Sep 2021 13:22 |
Last Modified: |
08 Sep 2021 13:22 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/41534 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |