Convex and Non-convex Model Compression for Large-Scale Model Training

Wu, Xidong (2024) Convex and Non-convex Model Compression for Large-Scale Model Training. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

PDF
Restricted to University of Pittsburgh users only until 3 June 2026.
Download (3MB) | Request a Copy

Abstract

Recently, machine learning (ML) models have gained extensive utilization across a variety of applications. However, unlike traditional methods, ML models, especially those employing deep learning, utilize their significant depth and intricate architecture to improve approximation capabilities, posing a challenge to the hardware capabilities of devices during deployment and communication during training. This challenge is particularly pronounced when deploying ML models on edge devices, which are characterized by limited storage and modest processing capabilities. To address these issues, the concept of model compression has emerged as an approach to reduce the size of ML models with minimal performance degradation and facilitate deployment. Several model compression techniques, such as weight pruning, knowledge distillation, and model screening, have been explored. Additionally, the training process for these models requires substantial data, and distributed/federated training serves as a solution to overcome data-related obstacles. The objective of this dissertation is to enhance the efficiency of convex and non-convex models, with or without multi-party collaborative training (distributed and federated learning).

We develop various approaches for compressing logistic classifiers (convex models) and deep learning models (nonconvex models). In task 1, we introduce a novel distributed dynamic safe screening framework for generalized sparse convex models. This framework reduces the model dimension in advance compared to traditional lasso techniques, accelerating distributed training and reducing communication overhead by discarding inactive features with zero coefficients. In task 2, we concentrate on the application of foundation models in federated learning. Foundation models exhibit outstanding performance and the ability to mitigate the impact of heterogeneous data distributions. We explore compressing foundation models to improve performance on edge devices. In task 3, we delve into structural pruning in centralized learning. We propose a new algorithm that employs the controller network to guide end-to-end model pruning without relying on additional fine-tuning procedures after removing redundant structures. Comprehensive experiments conducted on a large scale within distributed or centralized settings validate the rationale and efficacy of our proposed methodology. Finally, we provide related theoretical analysis to ensure the convergence of proposed algorithms.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Wu, Xidong	xidong_wu@pitt.edu	xiw156

ETD Committee:

Title	Member	Email Address
Committee Chair	Mao, Zhi-Hong	zhm4@pitt.edu
Committee CoChair	Huang, Yufei	yuh119@pitt.edu
Committee Member	Zhan, Liang	liang.zhan@pitt.edu
Committee Member	Dallal, Ahmed	ahd12@pitt.edu
Committee Member	Zeng, Bo	bzeng@pitt.edu
	Sun, Mingui	drsun@pitt.edu

Date:

3 June 2024

Date Type:

Publication

Defense Date:

1 April 2024

Approval Date:

3 June 2024

Submission Date:

2 April 2024

Access Restriction:

2 year -- Restrict access to University of Pittsburgh for a period of 2 years.

Number of Pages:

113

Institution:

University of Pittsburgh

Schools and Programs:

Swanson School of Engineering > Electrical and Computer Engineering

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Model Compression, Sparse Learning, Safe Screening, Distributed Training, Model Pruning, Model Distillation

Date Deposited:

03 Jun 2024 14:41

Last Modified:

03 Jun 2024 14:41

URI:

http://d-scholarship.pitt.edu/id/eprint/45974

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Convex and Non-convex Model Compression for Large-Scale Model Training

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds