The University of Southampton
University of Southampton Institutional Repository

Dynamic transformer for efficient machine translation on embedded devices

Dynamic transformer for efficient machine translation on embedded devices
Dynamic transformer for efficient machine translation on embedded devices
The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The proposed approach, 'Dynamic-HAT', uses a HAT SuperTransformer as the backbone to search for SubTransformers with different accuracy-latency trade-offs at design time. The optimal SubTransformers are sampled from the SuperTransformer at run-time, depending on latency constraints. The Dynamic-HAT is tested on the Jetson Nano and the approach uses inherited SubTransformers sampled directly from the SuperTransformer with a switching time of <1s. Using inherited SubTransformers results in a BLEU score loss of <1.5% because the SubTransformer configuration is not retrained from scratch after sampling. However, to recover this loss in performance, the dimensions of the design space can be reduced to tailor it to a family of target hardware. The new reduced design space results in a BLEU score increase of approximately 1% for sub-optimal models from the original design space, with a wide range for performance scaling between 0.356s - 1.526s for the GPU and 2.9s - 7.31s for the CPU.
Parry, Hishan
12d3e090-2f46-4ca4-a36a-cd23631dcba2
Xun, Lei
51a0da82-6979-49a8-8eff-ada011f5aff5
Sabetsarvestani, Mohammadamin
f5c0e55f-6f0c-4f56-9d6d-7de19d6fb136
Bi, Jia
8b23da1b-a6d6-43f4-9752-04a825093b3b
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Merrett, Geoff
89b3a696-41de-44c3-89aa-b0aa29f54020
Parry, Hishan
12d3e090-2f46-4ca4-a36a-cd23631dcba2
Xun, Lei
51a0da82-6979-49a8-8eff-ada011f5aff5
Sabetsarvestani, Mohammadamin
f5c0e55f-6f0c-4f56-9d6d-7de19d6fb136
Bi, Jia
8b23da1b-a6d6-43f4-9752-04a825093b3b
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Merrett, Geoff
89b3a696-41de-44c3-89aa-b0aa29f54020

Parry, Hishan, Xun, Lei, Sabetsarvestani, Mohammadamin, Bi, Jia, Hare, Jonathon and Merrett, Geoff (2021) Dynamic transformer for efficient machine translation on embedded devices. In 3rd ACM/IEEE Workshop on Machine Learning for CAD (MLCAD'21). 6 pp . (In Press)

Record type: Conference or Workshop Item (Paper)

Abstract

The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The proposed approach, 'Dynamic-HAT', uses a HAT SuperTransformer as the backbone to search for SubTransformers with different accuracy-latency trade-offs at design time. The optimal SubTransformers are sampled from the SuperTransformer at run-time, depending on latency constraints. The Dynamic-HAT is tested on the Jetson Nano and the approach uses inherited SubTransformers sampled directly from the SuperTransformer with a switching time of <1s. Using inherited SubTransformers results in a BLEU score loss of <1.5% because the SubTransformer configuration is not retrained from scratch after sampling. However, to recover this loss in performance, the dimensions of the design space can be reduced to tailor it to a family of target hardware. The new reduced design space results in a BLEU score increase of approximately 1% for sub-optimal models from the original design space, with a wide range for performance scaling between 0.356s - 1.526s for the GPU and 2.9s - 7.31s for the CPU.

Text
Dynamic-HAT_MLCAD 2021_Accepted - Accepted Manuscript
Available under License Creative Commons Attribution.
Download (272kB)

More information

Accepted/In Press date: 16 July 2021

Identifiers

Local EPrints ID: 450548
URI: http://eprints.soton.ac.uk/id/eprint/450548
PURE UUID: efe5d70d-6b5a-479b-8f09-fd182b1746ad
ORCID for Jonathon Hare: ORCID iD orcid.org/0000-0003-2921-4283
ORCID for Geoff Merrett: ORCID iD orcid.org/0000-0003-4980-3894

Catalogue record

Date deposited: 03 Aug 2021 16:31
Last modified: 04 Aug 2021 01:39

Export record

Contributors

Author: Hishan Parry
Author: Lei Xun
Author: Mohammadamin Sabetsarvestani
Author: Jia Bi
Author: Jonathon Hare ORCID iD
Author: Geoff Merrett ORCID iD

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×