The University of Southampton
University of Southampton Institutional Repository

Enabling on-device smartphone GPU based training: Lessons Learned

Enabling on-device smartphone GPU based training: Lessons Learned
Enabling on-device smartphone GPU based training: Lessons Learned

Deep Learning (DL) has shown impressive performance in many mobile applications. Most existing works have focused on reducing the computational and resource overheads of running Deep Neural Networks (DNN) inference on resource-constrained mobile devices. However, the other aspect of DNN operations, i.e. training (forward and backward passes) on smartphone GPUs, has received little attention thus far. To this end, we conduct an initial analysis to examine the feasibility of on-device training on smartphones using mobile GPUs. We first employ the open-source mobile DL framework (MNN) and its OpenCL backend for running compute kernels on GPUs. Next, we observed that training on CPUs is much faster than on GPUs and identified two possible bottlenecks related to this observation: (i) computation and (ii) memory bottlenecks. To solve the computation bottleneck, we optimize the OpenCL backend's kernels, showing 2x improvements (40-70 GFLOPs) over CPUs (15-30 GFLOPs) on the Snapdragon 8 series processors. However, we find that the full DNN training is still much slower on GPUs than on CPUs, indicating that memory bottleneck plays a significant role in the lower performance of GPU over CPU. The data movement takes almost 91% of training time due to the low bandwidth. Lastly, based on the findings and failures during our investigation, we present limitations and practical guidelines for future directions.

GPU, OpenCL, Smartphones, Training
533-538
IEEE
Das, Anish
530cd626-336f-4484-8122-5f7e22ac647a
Kwon, Young D.
3e8c3dcd-214c-4771-90f4-b36ede48d763
Chauhan, Jagmohan
831a12dc-6df9-40ea-8bb3-2c5da8882804
Mascolo, Cecilia
e4a7bcf7-72c8-43b7-b6b3-4f8980da245d
Das, Anish
530cd626-336f-4484-8122-5f7e22ac647a
Kwon, Young D.
3e8c3dcd-214c-4771-90f4-b36ede48d763
Chauhan, Jagmohan
831a12dc-6df9-40ea-8bb3-2c5da8882804
Mascolo, Cecilia
e4a7bcf7-72c8-43b7-b6b3-4f8980da245d

Das, Anish, Kwon, Young D., Chauhan, Jagmohan and Mascolo, Cecilia (2022) Enabling on-device smartphone GPU based training: Lessons Learned. In 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events, PerCom Workshops 2022. IEEE. pp. 533-538 . (doi:10.1109/PerComWorkshops53856.2022.9767442).

Record type: Conference or Workshop Item (Paper)

Abstract

Deep Learning (DL) has shown impressive performance in many mobile applications. Most existing works have focused on reducing the computational and resource overheads of running Deep Neural Networks (DNN) inference on resource-constrained mobile devices. However, the other aspect of DNN operations, i.e. training (forward and backward passes) on smartphone GPUs, has received little attention thus far. To this end, we conduct an initial analysis to examine the feasibility of on-device training on smartphones using mobile GPUs. We first employ the open-source mobile DL framework (MNN) and its OpenCL backend for running compute kernels on GPUs. Next, we observed that training on CPUs is much faster than on GPUs and identified two possible bottlenecks related to this observation: (i) computation and (ii) memory bottlenecks. To solve the computation bottleneck, we optimize the OpenCL backend's kernels, showing 2x improvements (40-70 GFLOPs) over CPUs (15-30 GFLOPs) on the Snapdragon 8 series processors. However, we find that the full DNN training is still much slower on GPUs than on CPUs, indicating that memory bottleneck plays a significant role in the lower performance of GPU over CPU. The data movement takes almost 91% of training time due to the low bandwidth. Lastly, based on the findings and failures during our investigation, we present limitations and practical guidelines for future directions.

This record has no associated files available for download.

More information

Published date: 2022
Venue - Dates: 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events, PerCom Workshops 2022, , Pisa, Italy, 2022-03-21 - 2022-03-25
Keywords: GPU, OpenCL, Smartphones, Training

Identifiers

Local EPrints ID: 491463
URI: http://eprints.soton.ac.uk/id/eprint/491463
PURE UUID: 83b6729d-21bc-44b4-8948-6a98caf6a541

Catalogue record

Date deposited: 24 Jun 2024 17:02
Last modified: 24 Jun 2024 17:02

Export record

Altmetrics

Contributors

Author: Anish Das
Author: Young D. Kwon
Author: Jagmohan Chauhan
Author: Cecilia Mascolo

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×