Advanced SIMD: extending the reach of contemporary SIMD architectures
Advanced SIMD: extending the reach of contemporary SIMD architectures
SIMD extensions have gained widespread acceptance in modern microprocessors as a way to exploit data-level parallelism in general-purpose cores. Popular SIMD architectures (e.g. Intel SSE/AVX) have evolved by adding support for wider registers and datapaths, and advanced features like indexed memory accesses, per-lane predication and inter-lane instructions, at the cost of additional silicon area and design complexity.
This paper evaluates the performance impact of such advanced features on a set of workloads considered hard to vectorize for traditional SIMD architectures. Their sensitivity to the most relevant design parameters (e.g. register/datapath width and L1 data cache configuration) is quantified and discussed.
We developed an ARMv7 NEON based ISA extension (ARGON), augmented a cycle accurate simulation framework for it, and derived a set of benchmarks from the Berkeley dwarfs. Our analyses demonstrate how ARGON can, depending on the structure of an algorithm, achieve speedups of 1.5x to 16x.
Boettcher, Matthias
d34d0210-df72-4f89-ad87-91d87a4f272a
Al-Hashimi, Bashir M.
0b29c671-a6d2-459c-af68-c4614dce3b5d
Eyole, Mbou
c954e758-34b7-4a65-995b-ec4f2a93ff42
Gabrielli, Giacomo
79e841c7-f0b2-48bb-81b0-9aba55fac63f
Reid, Alastair
6f778bae-124e-4299-b4f5-d024bdd487f0
Boettcher, Matthias
d34d0210-df72-4f89-ad87-91d87a4f272a
Al-Hashimi, Bashir M.
0b29c671-a6d2-459c-af68-c4614dce3b5d
Eyole, Mbou
c954e758-34b7-4a65-995b-ec4f2a93ff42
Gabrielli, Giacomo
79e841c7-f0b2-48bb-81b0-9aba55fac63f
Reid, Alastair
6f778bae-124e-4299-b4f5-d024bdd487f0
Boettcher, Matthias, Al-Hashimi, Bashir M., Eyole, Mbou, Gabrielli, Giacomo and Reid, Alastair
(2014)
Advanced SIMD: extending the reach of contemporary SIMD architectures.
Design, Automation, and Test in Europe Conference, DATE2014, Dresden, Germany.
24 - 28 Mar 2014.
(In Press)
Record type:
Conference or Workshop Item
(Paper)
Abstract
SIMD extensions have gained widespread acceptance in modern microprocessors as a way to exploit data-level parallelism in general-purpose cores. Popular SIMD architectures (e.g. Intel SSE/AVX) have evolved by adding support for wider registers and datapaths, and advanced features like indexed memory accesses, per-lane predication and inter-lane instructions, at the cost of additional silicon area and design complexity.
This paper evaluates the performance impact of such advanced features on a set of workloads considered hard to vectorize for traditional SIMD architectures. Their sensitivity to the most relevant design parameters (e.g. register/datapath width and L1 data cache configuration) is quantified and discussed.
We developed an ARMv7 NEON based ISA extension (ARGON), augmented a cycle accurate simulation framework for it, and derived a set of benchmarks from the Berkeley dwarfs. Our analyses demonstrate how ARGON can, depending on the structure of an algorithm, achieve speedups of 1.5x to 16x.
Text
matthias-date14.pdf
- Other
More information
Accepted/In Press date: March 2014
Venue - Dates:
Design, Automation, and Test in Europe Conference, DATE2014, Dresden, Germany, 2014-03-24 - 2014-03-28
Organisations:
Electronic & Software Systems
Identifiers
Local EPrints ID: 361119
URI: http://eprints.soton.ac.uk/id/eprint/361119
PURE UUID: 7dafbf15-727b-4ed6-8652-aab98dba6ff2
Catalogue record
Date deposited: 16 Jan 2014 11:04
Last modified: 14 Mar 2024 15:46
Export record
Contributors
Author:
Matthias Boettcher
Author:
Bashir M. Al-Hashimi
Author:
Mbou Eyole
Author:
Giacomo Gabrielli
Author:
Alastair Reid
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics