Langerman, David
(2022)
In Pursuit of Graph Analysis for Neural-Network Performance Evaluation.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
![[img]](http://d-scholarship.pitt.edu/style/images/fileicons/application_pdf.png) |
PDF (Langerman 2022 Dissertation Final Draft)
Primary Text
Restricted to University of Pittsburgh users only until 10 June 2024.
Download (4MB)
| Request a Copy
|
Abstract
High-level deep-learning frameworks such as TensorFlow and PyTorch abstract computation and data movement from neural network model designers, boosting productivity, and enabling deep-learning models to grow ever larger and more complex in pursuit of superhuman accuracies. Some of the largest models can even require multi-node clusters to efficiently train and deploy. When these models are published, often only the total floating-point operations (FLOPs) and the parameter count are given as proxies for performance compared to other architectures. The widespread use of GPUs to execute these network models calls into question the validity of using purely computational measures to gauge algorithms that are not compute-bound. While leveraging FLOPs has traditionally been the de facto method of evaluating computational cost, it ignores memory-access penalties, kernel-launch overheads, and data-movement costs.
This dissertation chronicles the journey of identifying and addressing this issue, starting with a low-level hardware accelerator. Even though the FLOPs of the algorithm do not change, it was shown that the accelerator design alone can have a large impact on scalability and performance. From there, a foray into deep learning (DL) begins. An existing DL algorithm was augmented with a state-of-the-art backbone resulting in a model with fewer FLOPs than the original. The goal was to boost the original network's performance. Instead, performance was lost, puzzling the researchers and leading to a deeper analysis on the model itself. It was discovered that the diameter of the directed-acyclic-graph (the Critical Datapath Length) describing a neural-network model was highly correlated with execution time. This phenomenon was shown across a set of 48 popular models running on multiple devices.
The suite of networks was expanded to include over 400 networks with a much wider variety of architectural features. These networks were analyzed with both graph- and compute-based metrics to form a dataset complete with a standard set of metrics including input Size, Parameter count, total Operations, and Critical Datapath Length. This suite of metrics was dubbed SPOC. When analyzed together, SPOC metrics can give actionable performance intuition and showcase how graph metrics can describe the initially perplexing benchmarks that were collected when this voyage began.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
10 June 2022 |
Date Type: |
Publication |
Defense Date: |
6 April 2022 |
Approval Date: |
10 June 2022 |
Submission Date: |
11 March 2022 |
Access Restriction: |
2 year -- Restrict access to University of Pittsburgh for a period of 2 years. |
Number of Pages: |
135 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Electrical and Computer Engineering |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Neural Networks, Performance Prediction, Indirect Metrics |
Date Deposited: |
10 Jun 2022 18:55 |
Last Modified: |
10 Jun 2022 18:55 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/42360 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |