The University of Southampton
University of Southampton Institutional Repository

FPGA based adaptive hardware acceleration for multiple deep learning tasks

FPGA based adaptive hardware acceleration for multiple deep learning tasks
FPGA based adaptive hardware acceleration for multiple deep learning tasks
Machine learning, and in particular deep learning (DL), has seen strong success in a wide variety of applications, e.g. object detection, image classification and self-driving. However, due to the limitations on hardware resources and power consumption, there are many challenges to deploy deep learning algorithms on resource-constrained mobile and embedded systems, especially for systems running multiple DL algorithms for a variety of tasks. In this paper, an adaptive hardware resource management system, implemented on field-programmable gate arrays (FPGAs), is proposed to dynamically manage the on-chip hardware resources (e.g. LUTs, BRAMs and DSPs) to adapt to a variety of tasks. Using dynamic function exchange (DFX) technology, the system can dynamically allocate hardware resources to deploy deep learning units (DPUs) so as to balance the requirements, performance and power consumption of the deep learning applications. The prototype is implemented on the Xilinx Zynq UltraScale+ series chips. The experiment results indicate that the proposed scheme significantly improves the computing efficiency of the resource-constrained systems under various experimental scenarios. Compared to the baseline, the proposed strategy consumes 38% and 82% of power in low working load cases and high working load cases, respectively. Typically, the proposed system can save approximately 75.8% of energy.
Deep learning processing unit, dynamic function exchange, DFX, deep learning, partial reconfiguration, embedded systems
Lu, Yufan
48c01f87-f3c1-4c21-93ed-7ab5134f3076
Zhai, Xiaojun
93ee3dbb-e10e-472b-adec-78acfcd4cbc7
Saha, Sangeet
168b72f1-80f6-4847-aba8-7c5fb7fa22b0
Ehsan, Shoaib
ae8922f0-dbe0-4b22-8474-98e84d852de7
McDonald-Maier, Klaus D.
d35c2e77-744a-4318-9d9d-726459e64db9
Lu, Yufan
48c01f87-f3c1-4c21-93ed-7ab5134f3076
Zhai, Xiaojun
93ee3dbb-e10e-472b-adec-78acfcd4cbc7
Saha, Sangeet
168b72f1-80f6-4847-aba8-7c5fb7fa22b0
Ehsan, Shoaib
ae8922f0-dbe0-4b22-8474-98e84d852de7
McDonald-Maier, Klaus D.
d35c2e77-744a-4318-9d9d-726459e64db9

Lu, Yufan, Zhai, Xiaojun, Saha, Sangeet, Ehsan, Shoaib and McDonald-Maier, Klaus D. (2022) FPGA based adaptive hardware acceleration for multiple deep learning tasks. In 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC). 6 pp . (doi:10.1109/MCSoC51149.2021.00038).

Record type: Conference or Workshop Item (Paper)

Abstract

Machine learning, and in particular deep learning (DL), has seen strong success in a wide variety of applications, e.g. object detection, image classification and self-driving. However, due to the limitations on hardware resources and power consumption, there are many challenges to deploy deep learning algorithms on resource-constrained mobile and embedded systems, especially for systems running multiple DL algorithms for a variety of tasks. In this paper, an adaptive hardware resource management system, implemented on field-programmable gate arrays (FPGAs), is proposed to dynamically manage the on-chip hardware resources (e.g. LUTs, BRAMs and DSPs) to adapt to a variety of tasks. Using dynamic function exchange (DFX) technology, the system can dynamically allocate hardware resources to deploy deep learning units (DPUs) so as to balance the requirements, performance and power consumption of the deep learning applications. The prototype is implemented on the Xilinx Zynq UltraScale+ series chips. The experiment results indicate that the proposed scheme significantly improves the computing efficiency of the resource-constrained systems under various experimental scenarios. Compared to the baseline, the proposed strategy consumes 38% and 82% of power in low working load cases and high working load cases, respectively. Typically, the proposed system can save approximately 75.8% of energy.

This record has no associated files available for download.

More information

e-pub ahead of print date: 4 February 2022
Keywords: Deep learning processing unit, dynamic function exchange, DFX, deep learning, partial reconfiguration, embedded systems

Identifiers

Local EPrints ID: 473496
URI: http://eprints.soton.ac.uk/id/eprint/473496
PURE UUID: 0500b4e3-3a39-4845-a57a-7b2a1a270987
ORCID for Shoaib Ehsan: ORCID iD orcid.org/0000-0001-9631-1898

Catalogue record

Date deposited: 20 Jan 2023 17:53
Last modified: 17 Mar 2024 04:16

Export record

Altmetrics

Contributors

Author: Yufan Lu
Author: Xiaojun Zhai
Author: Sangeet Saha
Author: Shoaib Ehsan ORCID iD
Author: Klaus D. McDonald-Maier

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×