Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Evaluation of Parameter-Scaling for Efficient Deep Learning on Small Satellites

Gealy, Calvin (2022) Evaluation of Parameter-Scaling for Efficient Deep Learning on Small Satellites. Master's Thesis, University of Pittsburgh. (Unpublished)

Primary Text

Download (257kB) | Preview


Parameter-scaling techniques change the number of parameters in a machine-learning model in an effort to make the network more amenable to different device types or accuracy requirements. This research compares the performance of two such techniques. NeuralScale is a neural architecture search method which claims to generate deep neural networks for devices that are resource-constrained. It shrinks a network to a target number of parameters by adjusting the width of layers independently to achieve a higher accuracy than previous methods. The novel NeuralScale algorithm is compared to the baseline uniform scaling of MobileNet-style models, where the width of each layer in the model is scaled uniformly across the network. Measurements of the latency and runtime memory required for inference were gathered on the NVIDIA Jetson TX2 and Jetson AGX Xavier embedded GPUs using NVIDIA TensorRT. Measurements were also gathered on the Raspberry Pi 4 embedded CPU featuring ARM Cortex-A72 cores using ONNX Runtime. VGG-11, MobileNetV2, Pre-Activation ResNet-18, and ResNet-50 were all scaled to 0.25×, 0.50×, 0.75×, and 1.00× the original number of parameters. On embedded GPUs, this research finds that NeuralScale models do offer higher accuracy, but they run slower and consume much more runtime memory during inference than their equivalent uniform-scaling models. On average, NeuralScale is 40% as efficient as uniform scaling in terms of accuracy per megabyte of runtime memory, and NeuralScale uses 2.7× the runtime memory per parameter as uniform scaling. On the embedded CPU, NeuralScale is slightly more efficient than uniform scaling in terms of accuracy per megabyte of memory, using essentially the same amount of memory per parameter. However, there is on average an over 2.5× increase in the latency for inference. Importantly, parameter count does not guarantee performance in terms of runtime-memory usage between the scaling methods on embedded GPUs, while latency grows significantly on embedded CPUs.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Gealy, Calvinc.gealy@pitt.educag1580000-0003-1173-0378
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorGeorge,
Committee MemberAbdelhakim, Maimaia@pitt.eduMAIA0000-0001-8442-0974
Committee MemberHu, JingtongJTHU@pitt.edujthu0000-0003-4029-4034
Date: 10 June 2022
Date Type: Publication
Defense Date: 18 March 2022
Approval Date: 10 June 2022
Submission Date: 3 March 2022
Access Restriction: 2 year -- Restrict access to University of Pittsburgh for a period of 2 years.
Number of Pages: 47
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical and Computer Engineering
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: machine learning, computer vision, embedded computing
Date Deposited: 10 Jun 2022 18:14
Last Modified: 10 Jun 2024 05:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item