Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

New Efficient and Privacy-Preserving Methods for Distributed Training

Xu, An (2023) New Efficient and Privacy-Preserving Methods for Distributed Training. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (12MB) | Preview

Abstract

The distributed training of deep learning models faces two issues: efficiency and privacy. First of all, training models can be slow and inefficient, especially when it is large with data distributed across multiple devices. For model parallelism, the inefficiency is caused by the backpropagation algorithm's forward locking, backward locking, and update locking problems. Existing solutions for acceleration either can only handle one locking problem or lead to severe accuracy loss or memory inefficiency. Moreover, none of them consider the straggler problem among devices. We propose Layer-wise Staleness and a novel efficient training algorithm, Diversely Stale Parameters (DSP), to address these challenges.

For data parallelism, the communication bottleneck has been a critical problem in large-scale distributed deep learning. We study distributed SGD with random block-wise sparsification as the gradient compressor, which is ring-allreduce compatible and highly computation-efficient but leads to inferior performance. To tackle this important issue, we propose a new detached error feedback (DEF) algorithm, which shows better convergence bound than error feedback for non-convex problems.

Secondly, distributed training raises concerns of data privacy when user's data is gathered to a central server. To keep data privacy, cross-silo federated learning (FL) has attracted much attention. However, there can be a generalization gap between the model trained from FL and the one from centralized training. We propose a novel training framework FedSM to avoid the client drift issue and successfully close the generalization gap compared with the centralized training for medical image segmentation tasks for the first time.

Communication efficiency is also crucial for federated learning (FL). Conducting local training steps in clients to reduce the communication frequency is a common method to address this issue. However, this strategy leads to the client drift problem due to non-i.i.d. data distributions. We propose a new method to improve the training performance via maintaining double momentum buffers.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Xu, Ananx6@pitt.eduanx60000-0001-7480-5010
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairZeng, Bobzeng@pitt.edu
Committee MemberDalla, AhmedAHD12@pitt.edu
Committee MemberBarati, Masoudmasoud.barati@pitt.edu
Committee MemberMao, Zhi-Hongzhm4@pitt.edu
Committee MemberSun, Minguidrsun@pitt.edu
Date: 14 September 2023
Date Type: Publication
Defense Date: 14 July 2023
Approval Date: 14 September 2023
Submission Date: 8 July 2023
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 157
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical and Computer Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: distributed training, model parallelism, data parallelism, federated learning, communication efficiency.
Date Deposited: 14 Sep 2023 13:41
Last Modified: 14 Sep 2023 13:41
URI: http://d-scholarship.pitt.edu/id/eprint/45074

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item