Dynamic transformer for efficient machine translation on embedded devices
Dynamic transformer for efficient machine translation on embedded devices
The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The proposed approach, 'Dynamic-HAT', uses a HAT SuperTransformer as the backbone to search for SubTransformers with different accuracy-latency trade-offs at design time. The optimal SubTransformers are sampled from the SuperTransformer at run-time, depending on latency constraints. The Dynamic-HAT is tested on the Jetson Nano and the approach uses inherited SubTransformers sampled directly from the SuperTransformer with a switching time of <1s. Using inherited SubTransformers results in a BLEU score loss of <1.5% because the SubTransformer configuration is not retrained from scratch after sampling. However, to recover this loss in performance, the dimensions of the design space can be reduced to tailor it to a family of target hardware. The new reduced design space results in a BLEU score increase of approximately 1% for sub-optimal models from the original design space, with a wide range for performance scaling between 0.356s - 1.526s for the GPU and 2.9s - 7.31s for the CPU.
Parry, Hishan
12d3e090-2f46-4ca4-a36a-cd23631dcba2
Xun, Lei
51a0da82-6979-49a8-8eff-ada011f5aff5
Sabetsarvestani, Mohammadamin
f5c0e55f-6f0c-4f56-9d6d-7de19d6fb136
Bi, Jia
8b23da1b-a6d6-43f4-9752-04a825093b3b
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Merrett, Geoff
89b3a696-41de-44c3-89aa-b0aa29f54020
9 September 2021
Parry, Hishan
12d3e090-2f46-4ca4-a36a-cd23631dcba2
Xun, Lei
51a0da82-6979-49a8-8eff-ada011f5aff5
Sabetsarvestani, Mohammadamin
f5c0e55f-6f0c-4f56-9d6d-7de19d6fb136
Bi, Jia
8b23da1b-a6d6-43f4-9752-04a825093b3b
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Merrett, Geoff
89b3a696-41de-44c3-89aa-b0aa29f54020
Parry, Hishan, Xun, Lei, Sabetsarvestani, Mohammadamin, Bi, Jia, Hare, Jonathon and Merrett, Geoff
(2021)
Dynamic transformer for efficient machine translation on embedded devices.
In 3rd ACM/IEEE Workshop on Machine Learning for CAD (MLCAD 2021).
6 pp
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The proposed approach, 'Dynamic-HAT', uses a HAT SuperTransformer as the backbone to search for SubTransformers with different accuracy-latency trade-offs at design time. The optimal SubTransformers are sampled from the SuperTransformer at run-time, depending on latency constraints. The Dynamic-HAT is tested on the Jetson Nano and the approach uses inherited SubTransformers sampled directly from the SuperTransformer with a switching time of <1s. Using inherited SubTransformers results in a BLEU score loss of <1.5% because the SubTransformer configuration is not retrained from scratch after sampling. However, to recover this loss in performance, the dimensions of the design space can be reduced to tailor it to a family of target hardware. The new reduced design space results in a BLEU score increase of approximately 1% for sub-optimal models from the original design space, with a wide range for performance scaling between 0.356s - 1.526s for the GPU and 2.9s - 7.31s for the CPU.
Text
Dynamic-HAT_MLCAD 2021_Accepted
- Accepted Manuscript
More information
Accepted/In Press date: 16 July 2021
Published date: 9 September 2021
Identifiers
Local EPrints ID: 450548
URI: http://eprints.soton.ac.uk/id/eprint/450548
PURE UUID: efe5d70d-6b5a-479b-8f09-fd182b1746ad
Catalogue record
Date deposited: 03 Aug 2021 16:31
Last modified: 17 Mar 2024 03:05
Export record
Contributors
Author:
Hishan Parry
Author:
Lei Xun
Author:
Mohammadamin Sabetsarvestani
Author:
Jia Bi
Author:
Jonathon Hare
Author:
Geoff Merrett
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics