New Efficient and Privacy-Preserving Methods for Distributed TrainingXu, An (2023) New Efficient and Privacy-Preserving Methods for Distributed Training. Doctoral Dissertation, University of Pittsburgh. (Unpublished)
AbstractThe distributed training of deep learning models faces two issues: efficiency and privacy. First of all, training models can be slow and inefficient, especially when it is large with data distributed across multiple devices. For model parallelism, the inefficiency is caused by the backpropagation algorithm's forward locking, backward locking, and update locking problems. Existing solutions for acceleration either can only handle one locking problem or lead to severe accuracy loss or memory inefficiency. Moreover, none of them consider the straggler problem among devices. We propose Layer-wise Staleness and a novel efficient training algorithm, Diversely Stale Parameters (DSP), to address these challenges. For data parallelism, the communication bottleneck has been a critical problem in large-scale distributed deep learning. We study distributed SGD with random block-wise sparsification as the gradient compressor, which is ring-allreduce compatible and highly computation-efficient but leads to inferior performance. To tackle this important issue, we propose a new detached error feedback (DEF) algorithm, which shows better convergence bound than error feedback for non-convex problems. Secondly, distributed training raises concerns of data privacy when user's data is gathered to a central server. To keep data privacy, cross-silo federated learning (FL) has attracted much attention. However, there can be a generalization gap between the model trained from FL and the one from centralized training. We propose a novel training framework FedSM to avoid the client drift issue and successfully close the generalization gap compared with the centralized training for medical image segmentation tasks for the first time. Communication efficiency is also crucial for federated learning (FL). Conducting local training steps in clients to reduce the communication frequency is a common method to address this issue. However, this strategy leads to the client drift problem due to non-i.i.d. data distributions. We propose a new method to improve the training performance via maintaining double momentum buffers. Share
Details
MetricsMonthly Views for the past 3 yearsPlum AnalyticsActions (login required)
|