Distributed Sparse Computing and Communication for Big Graph Analytics and Deep Learning

Hasanzadeh Mofrad, Mohammad (2021) Distributed Sparse Computing and Communication for Big Graph Analytics and Deep Learning. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Download (6MB) | Preview

Abstract

Sparsity can be found in the underlying structure of many real-world computationally expensive problems including big graph analytics and large scale sparse deep neural networks. In addition, if gracefully investigated, many of these problems contain a broad substratum of parallelism suitable for parallel and distributed executions of sparse computation. However, usually, dense computation is preferred to its sparse alternative as sparse computation is not only hard to parallelize due to the irregular nature of the sparse data, but also complicated to implement in terms of rewriting a dense algorithm into a sparse one. Hence, foolproof sparse computation requires customized data structures to encode the sparsity of the sparse data and new algorithms to mask the complexity of the sparse computation. However, by carefully exploiting the sparse data structures and algorithms, sparse computation can reduce memory consumption, communication volume, and processing power and thus undoubtedly move the scalability boundaries compared to its dense equivalent.

In this dissertation, I explain how to use parallel and distributed computing techniques in the presence of sparsity to solve large scientific problems including graph analytics and deep learning. To meet this end goal, I leverage the duality between graph theory and sparse linear algebra primitives, and thus solve graph analytics and deep learning problems with the sparse matrix operations. My contributions are fourfold: (1) design and implementation of a new distributed compressed sparse matrix data structure that reduces both computation and communication volumes and is suitable for sparse matrix-vector and sparse matrix-matrix operations, (2) introducing the new MPI*X parallelism model that deems threads as basic units of computing and communication, (3) optimizing sparse matrix-matrix multiplication by employing different hashing techniques, and (4) proposing the new data-then-model parallelism that mitigates the effect of stragglers in sparse deep learning by combining data and model parallelisms. Altogether, these contributions provide a set of data structures and algorithms to accelerate and scale the sparse computing and communication.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Hasanzadeh Mofrad, Mohammad	m.hasanzadeh.mofrad@gmail.com	moh18	0000-0003-4369-6928

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Melhem, Rami	melhem@pitt.edu	melhem
Committee Member	Labrinidis, Alexandros	labrinid@pitt.edu	labrinid
Committee Member	John R, Lange	jlange@pitt.edu	jlange
Committee Member	Palanisamy, Balaji	bpalan@pitt.edu	bpalan
Committee Member	Hammoud, Mohammad	mhhamoud@cmu.edu

Date:

3 January 2021

Date Type:

Publication

Defense Date:

30 October 2020

Approval Date:

3 January 2021

Submission Date:

3 November 2020

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Number of Pages:

130

Institution:

University of Pittsburgh

Schools and Programs:

School of Computing and Information > Computer Science

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Distributed Systems, Parallel Processing, Graph Computing, Sparse computing, Deep Neural Networks, Sparse Matrix-Vector (SpMV), Sparse Matrix-matrix Multiplication (SpMM)

Date Deposited:

03 Jan 2021 20:39

Last Modified:

03 Jan 2021 20:39

URI:

http://d-scholarship.pitt.edu/id/eprint/39841

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Distributed Sparse Computing and Communication for Big Graph Analytics and Deep Learning

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds