Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Distributed Sparse Computing and Communication for Big Graph Analytics and Deep Learning

Hasanzadeh Mofrad, Mohammad (2021) Distributed Sparse Computing and Communication for Big Graph Analytics and Deep Learning. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (6MB) | Preview


Sparsity can be found in the underlying structure of many real-world computationally expensive problems including big graph analytics and large scale sparse deep neural networks. In addition, if gracefully investigated, many of these problems contain a broad substratum of parallelism suitable for parallel and distributed executions of sparse computation. However, usually, dense computation is preferred to its sparse alternative as sparse computation is not only hard to parallelize due to the irregular nature of the sparse data, but also complicated to implement in terms of rewriting a dense algorithm into a sparse one. Hence, foolproof sparse computation requires customized data structures to encode the sparsity of the sparse data and new algorithms to mask the complexity of the sparse computation. However, by carefully exploiting the sparse data structures and algorithms, sparse computation can reduce memory consumption, communication volume, and processing power and thus undoubtedly move the scalability boundaries compared to its dense equivalent.

In this dissertation, I explain how to use parallel and distributed computing techniques in the presence of sparsity to solve large scientific problems including graph analytics and deep learning. To meet this end goal, I leverage the duality between graph theory and sparse linear algebra primitives, and thus solve graph analytics and deep learning problems with the sparse matrix operations. My contributions are fourfold: (1) design and implementation of a new distributed compressed sparse matrix data structure that reduces both computation and communication volumes and is suitable for sparse matrix-vector and sparse matrix-matrix operations, (2) introducing the new MPI*X parallelism model that deems threads as basic units of computing and communication, (3) optimizing sparse matrix-matrix multiplication by employing different hashing techniques, and (4) proposing the new data-then-model parallelism that mitigates the effect of stragglers in sparse deep learning by combining data and model parallelisms. Altogether, these contributions provide a set of data structures and algorithms to accelerate and scale the sparse computing and communication.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Hasanzadeh Mofrad, Mohammadm.hasanzadeh.mofrad@gmail.commoh180000-0003-4369-6928
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairMelhem, Ramimelhem@pitt.edumelhem
Committee MemberLabrinidis, Alexandroslabrinid@pitt.edulabrinid
Committee MemberJohn R, Langejlange@pitt.edujlange
Committee MemberPalanisamy, Balajibpalan@pitt.edubpalan
Committee MemberHammoud,
Date: 3 January 2021
Date Type: Publication
Defense Date: 30 October 2020
Approval Date: 3 January 2021
Submission Date: 3 November 2020
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 130
Institution: University of Pittsburgh
Schools and Programs: School of Computing and Information > Computer Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Distributed Systems, Parallel Processing, Graph Computing, Sparse computing, Deep Neural Networks, Sparse Matrix-Vector (SpMV), Sparse Matrix-matrix Multiplication (SpMM)
Date Deposited: 03 Jan 2021 20:39
Last Modified: 03 Jan 2021 20:39


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item