Evaluation of Scalability for Distributed Data-Parallel Training of Swin Transformer V2

Garrett, Dillon M. (2023) Evaluation of Scalability for Distributed Data-Parallel Training of Swin Transformer V2. Master's Thesis, University of Pittsburgh. (Unpublished)

Preview

PDF
Download (292kB) | Preview

Abstract

As recent research demonstrates, the trend in model size across deep learning has rapidly increased, helping to further the state-of-the-art. Along with an increase in model size comes increased computational demands on hardware and software computing platforms, leading to training scalability being of interest. Following the development of transformer-based models, it has become common practice to begin training with a pre-trained model and fine-tune it on a specific dataset to allow for wider adoption without full model retraining. While originally designed for the natural language processing field, transformers have been adapted to many other domains. Swin Transformer V2 is a transformer model used for computer-vision tasks that achieved state-of-the-art semantic segmentation results. This research provides a scalability analysis for the distributed data-parallel training of Swin Transformer V2 on the semantic segmentation vision task. The ADE20K semantic segmentation dataset is used for training instances to fine-tune this model. A weak scalability experiment is designed, increasing the number of GPUs for training while holding the problem size constant. To implement this experiment, the sub-batch size per GPU is held constant at 8 images per GPU per iteration and the total number of iterations is scaled down. Training time, GPU utilization, and CPU utilization metrics for single- and multi-GPUs are measured on NVIDIA A100 SXM, NVIDIA A100 PCIe, and NVIDIA V100 PCIe GPU platforms hosted by the Center for Research Computing at the University of Pittsburgh. Training speedup and parallel efficiency metrics are calculated. For all computing platforms, training on 2 GPUs is 26% faster on average when compared to single GPU training. However, diminishing returns are observed when adding additional GPUs because smaller speedup benefits are observed. When increasing the number of GPUs from 2 to 4, the training is only 1.9% faster on average on NVIDIA A100 PCIe and NVIDIA V100 PCIe nodes. For NVLINK-enabled NVIDIA A100 nodes, training is only 2.9% faster when increasing the number of GPUs from 4 to 8. Consequentially, distributed data-parallel training of Swin Transformer V2 scales poorly as the number of devices is increased.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Garrett, Dillon M.	dmg111@pitt.edu	dmg111	0009-0006-3961-6541

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	George, Alan D.	alan.george@pitt.edu	adg91
Committee Member	Dallal, Ahmed Hassan Sayed	ahd12@pitt.edu	ahd12
Committee Member	Dickerson, Samuel J.	dickerson@pitt.edu	sjdst31

Date:

14 September 2023

Date Type:

Publication

Defense Date:

27 April 2023

Approval Date:

14 September 2023

Submission Date:

8 May 2023

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Number of Pages:

Institution:

University of Pittsburgh

Schools and Programs:

Swanson School of Engineering > Electrical and Computer Engineering

Degree:

MS - Master of Science

Thesis Type:

Master's Thesis

Refereed:

Yes

Uncontrolled Keywords:

high-performance computing, machine learning, GPU, transformer, training

Date Deposited:

14 Sep 2023 13:33

Last Modified:

14 Sep 2023 13:33

URI:

http://d-scholarship.pitt.edu/id/eprint/44738

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Evaluation of Scalability for Distributed Data-Parallel Training of Swin Transformer V2

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds