Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

From Variance-Reduced Initialization to Knowledge Distillation-Inspired Pruning at Initialization: Embedding Efficiency Right from the Onset of Neural Network Training

Puthiaraju, Ganesh and Zeng, Bo and Mao, Zhi-Hong (2024) From Variance-Reduced Initialization to Knowledge Distillation-Inspired Pruning at Initialization: Embedding Efficiency Right from the Onset of Neural Network Training. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (8MB) | Preview

Abstract

The metaphor of Artificial Intelligence (AI) as “the new electricity” aptly describes its evolution into a ubiquitous tool, but this progress has come at a steep price due to the
increasing complexity of Deep Neural Network (DNN) architectures, which present formidable training challenges. This dissertation explores solutions to address some of the training associated primary challenges and embeds efficiency right from the outset of training. As a remedy to the exploding and vanishing gradient problem (EVGP), and a highly irregular optimization landscape hindering learning, the study introduces a universally applicable Variance-Reduced initialization technique that initializes weights as Gaussian random matrices, with parameters of the distribution derived using a Gaussian integral. Subsequently, the weight matrices are “Variance-Reduced” through a carefully designed process dependent on the network architecture. Theoretically, we demonstrate that our technique positions initial parameters closer to the optimum and facilitates faster convergence. Experimentally, we showcase that our approach offers better generalization, promotes a more stable learning process, and substantiates superior test performance. Furthermore, this thesis addresses overparameterization, yet another challenge, presenting a paradigm shift in pruning-at-initialization
with the Knowledge Distillation-based Lottery Ticket Search (KD-LTS). This framework efficiently extracts heterogeneous information from an ensemble of teacher networks. By employing a series of deterministic relaxations to address a Mixed Integer Optimization problem for training binary masks, the technique transfers the distilled information into a dense, randomly initialized student network, thus facilitating the identification of subnetworks at initialization. This work is believed to be the first in the literature to achieve state-of-the-art results across a Pareto optimal boundary, including sparsity, test performance, and computational complexity in identifying subnetworks at initialization.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Puthiaraju, Ganeshgap48@pitt.edugap480009-0001-5318-5753
Zeng, Bobzeng@pitt.edubzeng
Mao, Zhi-Hongzhm4@pitt.eduzhm4
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee CoChairMao, Zhi-Hongzhm4@pitt.eduzhm4
Committee CoChairZeng, Bobzeng@pitt.edubzeng
Committee MemberDickerson, Samueldickerson@pitt.edudickerson
Committee MemberDallal, Ahmedahd12@pitt.eduahd12
Committee MemberSun, Minguidrsun@pitt.edudrsun
Committee MemberZhan, Liangliang.zhan@pitt.eduliang.zhan
Date: 6 September 2024
Date Type: Publication
Defense Date: 22 March 2024
Approval Date: 6 September 2024
Submission Date: 18 July 2024
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 134
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical and Computer Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Artificial Intelligence, Neural Networks, Deep Learning, Lottery Ticket Hypothesis, Pruning, Initialization
Date Deposited: 06 Sep 2024 20:03
Last Modified: 06 Sep 2024 20:03
URI: http://d-scholarship.pitt.edu/id/eprint/46703

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item