The University of Southampton
University of Southampton Institutional Repository

Accuracy to throughput trade-offs for reduced precision neural networks on reconfigurable logic

Accuracy to throughput trade-offs for reduced precision neural networks on reconfigurable logic
Accuracy to throughput trade-offs for reduced precision neural networks on reconfigurable logic

Modern Convolutional Neural Networks (CNNs) are typically based on floating point linear algebra based implementations. Recently, reduced precision Neural Networks (NNs) have been gaining popularity as they require significantly less memory and computational resources compared to floating point. This is particularly important in power constrained compute environments. However, in many cases a reduction in precision comes at a small cost to the accuracy of the resultant network. In this work, we investigate the accuracy-throughput trade-off for various parameter precision applied to different types of NN models. We firstly propose a quantization training strategy that allows reduced precision NN inference with a lower memory footprint and competitive model accuracy. Then, we quantitatively formulate the relationship between data representation and hardware efficiency. Our experiments finally provide insightful observation. For example, one of our tests show 32-bit floating point is more hardware efficient than 1-bit parameters to achieve 99% MNIST accuracy. In general, 2-bit and 4-bit fixed point parameters show better hardware trade-off on small-scale datasets like MNIST and CIFAR-10 while 4-bit provide the best trade-off in large-scale tasks like AlexNet on ImageNet dataset within our tested problem domain.

Algorithm acceleration, FPGA, Neural networks, Reduced precision
0302-9743
29-42
Springer
Su, Jiang
610a2aed-be1b-4cda-b0fa-61b912d26802
Fraser, Nicholas J.
cd76a560-911e-4c50-a2e2-f2958a6b0c28
Gambardella, Giulio
3ab848b3-4492-4d9f-984e-095ed5668a66
Blott, Michaela
782fc1cd-357a-4d6c-b729-4c4d9095a18c
Durelli, Gianluca
894da643-612b-40b9-939f-bcbbac9435d4
Thomas, David B.
5701997d-7de3-4e57-a802-ea2bd3e6ab6c
Leong, Philip H.W.
fe9776b3-cce8-4e69-920a-27c8b1f7015d
Cheung, Peter Y.K.
7a175b08-9e60-4f7c-bf75-bda5e529fefd
Voros, Nikolaos
Keramidas, Georgios
Antonopoulos, Christos
Huebner, Michael
Diniz, Pedro C.
Goehringer, Diana
Su, Jiang
610a2aed-be1b-4cda-b0fa-61b912d26802
Fraser, Nicholas J.
cd76a560-911e-4c50-a2e2-f2958a6b0c28
Gambardella, Giulio
3ab848b3-4492-4d9f-984e-095ed5668a66
Blott, Michaela
782fc1cd-357a-4d6c-b729-4c4d9095a18c
Durelli, Gianluca
894da643-612b-40b9-939f-bcbbac9435d4
Thomas, David B.
5701997d-7de3-4e57-a802-ea2bd3e6ab6c
Leong, Philip H.W.
fe9776b3-cce8-4e69-920a-27c8b1f7015d
Cheung, Peter Y.K.
7a175b08-9e60-4f7c-bf75-bda5e529fefd
Voros, Nikolaos
Keramidas, Georgios
Antonopoulos, Christos
Huebner, Michael
Diniz, Pedro C.
Goehringer, Diana

Su, Jiang, Fraser, Nicholas J., Gambardella, Giulio, Blott, Michaela, Durelli, Gianluca, Thomas, David B., Leong, Philip H.W. and Cheung, Peter Y.K. (2018) Accuracy to throughput trade-offs for reduced precision neural networks on reconfigurable logic. Voros, Nikolaos, Keramidas, Georgios, Antonopoulos, Christos, Huebner, Michael, Diniz, Pedro C. and Goehringer, Diana (eds.) In Applied Reconfigurable Computing: Architectures, Tools, and Applications - 14th International Symposium, ARC 2018, Proceedings. vol. 10824 LNCS, Springer. pp. 29-42 . (doi:10.1007/978-3-319-78890-6_3).

Record type: Conference or Workshop Item (Paper)

Abstract

Modern Convolutional Neural Networks (CNNs) are typically based on floating point linear algebra based implementations. Recently, reduced precision Neural Networks (NNs) have been gaining popularity as they require significantly less memory and computational resources compared to floating point. This is particularly important in power constrained compute environments. However, in many cases a reduction in precision comes at a small cost to the accuracy of the resultant network. In this work, we investigate the accuracy-throughput trade-off for various parameter precision applied to different types of NN models. We firstly propose a quantization training strategy that allows reduced precision NN inference with a lower memory footprint and competitive model accuracy. Then, we quantitatively formulate the relationship between data representation and hardware efficiency. Our experiments finally provide insightful observation. For example, one of our tests show 32-bit floating point is more hardware efficient than 1-bit parameters to achieve 99% MNIST accuracy. In general, 2-bit and 4-bit fixed point parameters show better hardware trade-off on small-scale datasets like MNIST and CIFAR-10 while 4-bit provide the best trade-off in large-scale tasks like AlexNet on ImageNet dataset within our tested problem domain.

This record has no associated files available for download.

More information

Published date: 8 April 2018
Additional Information: Funding Information: Acknowledgments. The authors from Imperial College London would like to acknowledge the support of UK’s research council (RCUK) with the following grants: EP/K034448, P010040 and N031768. The authors from The University of Sydney acknowledge support from the Australian Research Council Linkage Project LP130101034. Funding Information: The authors from Imperial College London would like to acknowledge the support of UK’s research council (RCUK) with the following grants: EP/K034448, P010040 and N031768. The authors from The University of Sydney acknowledge support from the Australian Research Council Linkage Project LP130101034. Publisher Copyright: © Springer International Publishing AG, part of Springer Nature 2018. Copyright: Copyright 2018 Elsevier B.V., All rights reserved.
Venue - Dates: 14th International Symposium on Applied Reconfigurable Computing, ARC 2018, , Santorini, Greece, 2018-05-02 - 2018-05-04
Keywords: Algorithm acceleration, FPGA, Neural networks, Reduced precision

Identifiers

Local EPrints ID: 453676
URI: http://eprints.soton.ac.uk/id/eprint/453676
ISSN: 0302-9743
PURE UUID: a4918b95-606f-4fdd-af1c-835ad2f802b4
ORCID for David B. Thomas: ORCID iD orcid.org/0000-0002-9671-0917

Catalogue record

Date deposited: 20 Jan 2022 17:45
Last modified: 18 Mar 2024 04:04

Export record

Altmetrics

Contributors

Author: Jiang Su
Author: Nicholas J. Fraser
Author: Giulio Gambardella
Author: Michaela Blott
Author: Gianluca Durelli
Author: David B. Thomas ORCID iD
Author: Philip H.W. Leong
Author: Peter Y.K. Cheung
Editor: Nikolaos Voros
Editor: Georgios Keramidas
Editor: Christos Antonopoulos
Editor: Michael Huebner
Editor: Pedro C. Diniz
Editor: Diana Goehringer

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×