Tang, Yue
(2024)
Efficient Hardware and Software Design for On-Device Learning of Video Streams.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Video detection has been widely applied in edge devices with heterogeneous accelerators for video pre-processing and real-time inference. Our work aims at video detection in an elderly care robotic application. When an elderly falls, a DNN model is necessary to both recognize the action and localize the temporal position where the action happens. Such a task is called temporal action localization (TAL). Currently, a TAL DNN is first trained on the centralized cloud and then deployed to edge devices. However, when devices come to a new environment, it is necessary to use online streaming data to update the pre-trained model to improve accuracy. To adapt to new environments with less time consumption while protecting users’ privacy, it is desirable for the models to continuously and directly learn from local data on the device.
However, existing software and hardware systems are mainly designed for inference, while training is not the main concern. To enable efficient on-device learning for video learning, three main challenges need to be solved. The first is how to develop practical algorithms on the software side that are feasible to improve the TAL model directly from on-device a single long video stream rather than pre-divided video datasets collected in the cloud. The second challenge is how to implement the algorithm on individual resource-limited edge devices with severe computation complexity and intra-device communication bottlenecks. Third, we need to address how heterogeneous accelerators on the devices collaborate in the training process for efficient training with computation complexity across different accelerators and the inter-device communication bottleneck. The first challenge is from the software perspective, while the latter two challenges are from the hardware perspective.
To address the first challenge, we have developed a weakly-supervised on-device learning framework for streaming videos that contains all class actions and does not require any laborious manual pre-segmentation. To solve the second challenge, we have developed an efficient DNN training accelerator that can achieve end-to-end training on a single resource-limited low-power edge-level device. For the last challenge, a multi-accelerator training algorithm is proposed to enable efficient CNN training on edge devices with heterogeneous training accelerators.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
6 September 2024 |
Date Type: |
Publication |
Defense Date: |
17 June 2024 |
Approval Date: |
6 September 2024 |
Submission Date: |
26 June 2024 |
Access Restriction: |
2 year -- Restrict access to University of Pittsburgh for a period of 2 years. |
Number of Pages: |
157 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Electrical and Computer Engineering |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
on-device training, single long video stream |
Date Deposited: |
06 Sep 2024 19:56 |
Last Modified: |
06 Sep 2024 19:56 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/46631 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |