Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Relational Machine Learning Algorithms

Samadianzakaria, Alireza (2021) Relational Machine Learning Algorithms. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF (Dissertation)
Primary Text

Download (899kB) | Preview

Abstract

The majority of learning tasks faced by data scientists involve relational data, yet most standard algorithms for standard learning problems are not designed to accept relational data as input. The standard practice to address this issue is to join the relational data to create the type of geometric input that standard learning algorithms expect. Unfortunately, this standard practice has exponential worst-case time and space complexity. This leads us to consider what we call the Relational Learning Question: "Which standard learning algorithms can be efficiently implemented on relational data, and for those that cannot, is there an alternative algorithm that can be efficiently implemented on relational data and that has similar performance guarantees to the standard algorithm?"

In this dissertation, we address the relational learning question for the well-known problems of support vector machine (SVM), logistic regression, and $k$-means clustering. First, we design an efficient relational algorithm for regularized linear SVM and logistic regression using sampling methods. We show how to implement a variation of gradient descent that provides a nearly optimal approximation guarantee for stable instances. For the $k$-means problem, we show that the $k$-means++ algorithm can be efficiently implemented on relational data, and that a slight variation of adaptive k-means algorithm can be efficiently implemented on relational data while maintaining a constant approximation guarantee. On the way to developing these algorithms, we give an efficient approximation algorithm for certain sum-product queries with additive inequalities that commonly arise.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Samadianzakaria, Alirezasamadian@cs.pitt.eduals4170000-0002-5890-1499
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairPruhs, Kirkkirk@cs.pitt.edukrp2
Committee MemberChrysanthis, Panospanos@pitt.edupanos
Committee MemberKovashka, Adrianakovashka@cs.pitt.edukovashka
Committee MemberMoseley, Benjaminmoseleyb@andrew.cmu.edu
Date: 8 September 2021
Date Type: Publication
Defense Date: 7 July 2021
Approval Date: 8 September 2021
Submission Date: 30 July 2021
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 145
Institution: University of Pittsburgh
Schools and Programs: School of Computing and Information > Computer Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Relational Algorithms, In-Database Machine Learning, K Means, Linear SVM, Logistic Regression, Functional Aggregation Queries under Additive Inequality
Date Deposited: 08 Sep 2021 13:22
Last Modified: 08 Sep 2021 13:22
URI: http://d-scholarship.pitt.edu/id/eprint/41534

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item