Puthiaraju, Ganesh and Zeng, Bo and Mao, Zhi-Hong
(2024)
From Variance-Reduced Initialization to Knowledge Distillation-Inspired Pruning at Initialization: Embedding Efficiency Right from the Onset of Neural Network Training.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
The metaphor of Artificial Intelligence (AI) as “the new electricity” aptly describes its evolution into a ubiquitous tool, but this progress has come at a steep price due to the
increasing complexity of Deep Neural Network (DNN) architectures, which present formidable training challenges. This dissertation explores solutions to address some of the training associated primary challenges and embeds efficiency right from the outset of training. As a remedy to the exploding and vanishing gradient problem (EVGP), and a highly irregular optimization landscape hindering learning, the study introduces a universally applicable Variance-Reduced initialization technique that initializes weights as Gaussian random matrices, with parameters of the distribution derived using a Gaussian integral. Subsequently, the weight matrices are “Variance-Reduced” through a carefully designed process dependent on the network architecture. Theoretically, we demonstrate that our technique positions initial parameters closer to the optimum and facilitates faster convergence. Experimentally, we showcase that our approach offers better generalization, promotes a more stable learning process, and substantiates superior test performance. Furthermore, this thesis addresses overparameterization, yet another challenge, presenting a paradigm shift in pruning-at-initialization
with the Knowledge Distillation-based Lottery Ticket Search (KD-LTS). This framework efficiently extracts heterogeneous information from an ensemble of teacher networks. By employing a series of deterministic relaxations to address a Mixed Integer Optimization problem for training binary masks, the technique transfers the distilled information into a dense, randomly initialized student network, thus facilitating the identification of subnetworks at initialization. This work is believed to be the first in the literature to achieve state-of-the-art results across a Pareto optimal boundary, including sparsity, test performance, and computational complexity in identifying subnetworks at initialization.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
6 September 2024 |
Date Type: |
Publication |
Defense Date: |
22 March 2024 |
Approval Date: |
6 September 2024 |
Submission Date: |
18 July 2024 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
134 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Electrical and Computer Engineering |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Artificial Intelligence, Neural Networks, Deep Learning, Lottery Ticket Hypothesis, Pruning, Initialization |
Date Deposited: |
06 Sep 2024 20:03 |
Last Modified: |
06 Sep 2024 20:03 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/46703 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |