General hardware multicasting for fine-grained message-passing architectures
General hardware multicasting for fine-grained message-passing architectures
Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in some application domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model.
126-133
Naylor, Matthew
7f0b28f7-b50f-40cc-b3ed-3821e5ef1230
Moore, Simon W.
e9f2be21-1fa3-43aa-a3e2-fc8519f97a00
Thomas, David
5701997d-7de3-4e57-a802-ea2bd3e6ab6c
Beaumont, Jonathan R.
468f446e-2cff-4285-a490-44e634f468c0
Fleming, Shane
1a7f7be0-0c3f-4125-9298-5b5a6e0bc76e
Vousden, Mark
d45312dd-a46f-4376-89f4-38b1ac8957c9
Markettos, A. Theodore
76ebcf7c-05b3-4560-ba47-23eeb6f7787b
Bytheway, Thomas
95af7e4b-5daf-4fc4-b6e5-96a3b9f95b4c
Brown, Andrew
5c19e523-65ec-499b-9e7c-91522017d7e0
10 March 2021
Naylor, Matthew
7f0b28f7-b50f-40cc-b3ed-3821e5ef1230
Moore, Simon W.
e9f2be21-1fa3-43aa-a3e2-fc8519f97a00
Thomas, David
5701997d-7de3-4e57-a802-ea2bd3e6ab6c
Beaumont, Jonathan R.
468f446e-2cff-4285-a490-44e634f468c0
Fleming, Shane
1a7f7be0-0c3f-4125-9298-5b5a6e0bc76e
Vousden, Mark
d45312dd-a46f-4376-89f4-38b1ac8957c9
Markettos, A. Theodore
76ebcf7c-05b3-4560-ba47-23eeb6f7787b
Bytheway, Thomas
95af7e4b-5daf-4fc4-b6e5-96a3b9f95b4c
Brown, Andrew
5c19e523-65ec-499b-9e7c-91522017d7e0
Naylor, Matthew, Moore, Simon W., Thomas, David, Beaumont, Jonathan R., Fleming, Shane, Vousden, Mark, Markettos, A. Theodore, Bytheway, Thomas and Brown, Andrew
(2021)
General hardware multicasting for fine-grained message-passing architectures.
In Proceedings - 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2021.
IEEE.
.
(doi:10.1109/PDP52278.2021.00028).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in some application domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model.
This record has no associated files available for download.
More information
Published date: 10 March 2021
Additional Information:
Funding Information:
In this work, we have designed, implemented, and evaluated new techniques for hardware multicasting that support both colocated and distributed destinations. These techniques preserve an event-driven API with software-exposed flow control -two main features of the message-passing paradigm. To our knowledge, they are the first such techniques capable of capturing large unstructured communication patterns precisely. All this has been done in a whole-system context, from low-level microarchitecture to high-level architecture-agnostic application development, and has been demonstrated on range of realistic applications. We hope these experiences will serve the future development of manycore architectures, in an era where power efficiency and scalability become evermore important. Acknowledgments This work was supported by UK EPSRC grant EP/N031768/1 (POETS project). Re f e r e n c e s
Publisher Copyright:
© 2021 IEEE.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
Venue - Dates:
29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2021, , Virtual, Valladolid, Spain, 2021-03-10 - 2021-03-12
Identifiers
Local EPrints ID: 453692
URI: http://eprints.soton.ac.uk/id/eprint/453692
PURE UUID: 3c42832c-ea51-42f2-8d08-890aff2b9203
Catalogue record
Date deposited: 20 Jan 2022 17:46
Last modified: 18 Mar 2024 04:04
Export record
Altmetrics
Contributors
Author:
Matthew Naylor
Author:
Simon W. Moore
Author:
David Thomas
Author:
Jonathan R. Beaumont
Author:
Shane Fleming
Author:
Mark Vousden
Author:
A. Theodore Markettos
Author:
Thomas Bytheway
Author:
Andrew Brown
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics