TinyOps: ImageNet Scale Deep Learning on Microcontrollers
TinyOps: ImageNet Scale Deep Learning on Microcontrollers
Deep Learning on microcontroller (MCU) based IoT devices is extremely challenging due to memory constraints. Prior approaches focus on using internal memory or external memories exclusively which limit either accuracy or latency. We find that a hybrid method using internal and external MCU memories outperforms both approaches in accuracy and latency. We develop TinyOps, an inference engine which accelerates inference latency of models in slow external memory, using a partitioning and overlaying scheme via the available Direct Memory Access (DMA) peripheral to combine the advantages of external memory
(size) and internal memory (speed). Experimental results show that architectures deployed with TinyOps significantly outperform models designed for internal memory with up to 6% higher accuracy and importantly, 1.3-2.2x faster inference latency to set the state-of-the-art in TinyML ImageNet classification. Our work shows that the TinyOps space is more efficient compared to the internal or external memory design spaces and should be explored further for TinyML applications.
Sadiq, Sulaiman
e82e1fe2-6b8c-4c49-b051-8aef0dabe99a
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Maji, Partha
d9041c15-c0b1-4a97-98a1-d1f023b48162
Craske, Simon
9b47212e-aaad-4f80-9154-898ec7912df3
Merrett, Geoff
89b3a696-41de-44c3-89aa-b0aa29f54020
Sadiq, Sulaiman
e82e1fe2-6b8c-4c49-b051-8aef0dabe99a
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Maji, Partha
d9041c15-c0b1-4a97-98a1-d1f023b48162
Craske, Simon
9b47212e-aaad-4f80-9154-898ec7912df3
Merrett, Geoff
89b3a696-41de-44c3-89aa-b0aa29f54020
Sadiq, Sulaiman, Hare, Jonathon, Maji, Partha, Craske, Simon and Merrett, Geoff
(2022)
TinyOps: ImageNet Scale Deep Learning on Microcontrollers.
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2022).
(In Press)
Record type:
Conference or Workshop Item
(Paper)
Abstract
Deep Learning on microcontroller (MCU) based IoT devices is extremely challenging due to memory constraints. Prior approaches focus on using internal memory or external memories exclusively which limit either accuracy or latency. We find that a hybrid method using internal and external MCU memories outperforms both approaches in accuracy and latency. We develop TinyOps, an inference engine which accelerates inference latency of models in slow external memory, using a partitioning and overlaying scheme via the available Direct Memory Access (DMA) peripheral to combine the advantages of external memory
(size) and internal memory (speed). Experimental results show that architectures deployed with TinyOps significantly outperform models designed for internal memory with up to 6% higher accuracy and importantly, 1.3-2.2x faster inference latency to set the state-of-the-art in TinyML ImageNet classification. Our work shows that the TinyOps space is more efficient compared to the internal or external memory design spaces and should be explored further for TinyML applications.
Text
Sadiq_TinyOps_ImageNet_Scale_Deep_Learning_on_Microcontrollers_CVPRW_2022_paper
- Version of Record
More information
Accepted/In Press date: 2022
Venue - Dates:
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2022), 2022-06-19
Identifiers
Local EPrints ID: 456880
URI: http://eprints.soton.ac.uk/id/eprint/456880
PURE UUID: 9810d425-34fc-428a-b14e-80e62fe133d4
Catalogue record
Date deposited: 16 May 2022 16:30
Last modified: 17 Mar 2024 03:05
Export record
Contributors
Author:
Sulaiman Sadiq
Author:
Jonathon Hare
Author:
Partha Maji
Author:
Simon Craske
Author:
Geoff Merrett
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics