Termination detection for fine-grained message-passing architectures
Termination detection for fine-grained message-passing architectures
Barrier primitives provided by standard parallel programming APIs are the primary means by which applications implement global synchronisation. Typically these primitives are fully-committed to synchronisation in the sense that, once a barrier is entered, synchronisation is the only way out. For message-passing applications, this raises the question of what happens when a message arrives at a thread that already resides in a barrier. Without a satisfactory answer, barriers do not interact with message-passing in any useful way.In this paper, we propose a new refutable barrier primitive that combines with message-passing to form a simple, expressive, efficient, well-defined API. It has a clear semantics based on termination detection, and supports the development of both globally-synchronous and asynchronous parallel applications.To evaluate the new primitive, we implement it in a prototype large-scale message-passing machine with 49, 152 RISC-V threads distributed over 48 FPGAs. We show that hardware support for the primitive leads to a highly-efficient implementation, capable of synchronisation rates that are an order-of-magnitude higher than what is achievable in software. Using the primitive, we implement synchronous and asynchronous versions of a range of applications, observing that each version can have significant advantages over the other, depending on the application. Therefore, a barrier primitive supporting both styles can greatly assist the development of parallel programs.
17-24
Naylor, Matthew
6c0f1008-4db4-4c09-8461-b2355bf25275
Moore, Simon W.
e9f2be21-1fa3-43aa-a3e2-fc8519f97a00
Mokhov, Andrey
7ad0909b-34e8-4f32-908c-b6406b397776
Thomas, David
5701997d-7de3-4e57-a802-ea2bd3e6ab6c
Beaumont, Jonathan R.
468f446e-2cff-4285-a490-44e634f468c0
Fleming, Shane
1a7f7be0-0c3f-4125-9298-5b5a6e0bc76e
Markettos, A. Theodore
76ebcf7c-05b3-4560-ba47-23eeb6f7787b
Bytheway, Thomas
95af7e4b-5daf-4fc4-b6e5-96a3b9f95b4c
Brown, Andrew
5c19e523-65ec-499b-9e7c-91522017d7e0
Naylor, Matthew
6c0f1008-4db4-4c09-8461-b2355bf25275
Moore, Simon W.
e9f2be21-1fa3-43aa-a3e2-fc8519f97a00
Mokhov, Andrey
7ad0909b-34e8-4f32-908c-b6406b397776
Thomas, David
5701997d-7de3-4e57-a802-ea2bd3e6ab6c
Beaumont, Jonathan R.
468f446e-2cff-4285-a490-44e634f468c0
Fleming, Shane
1a7f7be0-0c3f-4125-9298-5b5a6e0bc76e
Markettos, A. Theodore
76ebcf7c-05b3-4560-ba47-23eeb6f7787b
Bytheway, Thomas
95af7e4b-5daf-4fc4-b6e5-96a3b9f95b4c
Brown, Andrew
5c19e523-65ec-499b-9e7c-91522017d7e0
Naylor, Matthew, Moore, Simon W., Mokhov, Andrey, Thomas, David, Beaumont, Jonathan R., Fleming, Shane, Markettos, A. Theodore, Bytheway, Thomas and Brown, Andrew
(2020)
Termination detection for fine-grained message-passing architectures.
Hannig, Frank, Navaridas, Javier, Koch, Dirk and Abdelhadi, Ameer
(eds.)
In 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP).
vol. 2020-July,
IEEE.
.
(doi:10.1109/ASAP49362.2020.00012).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Barrier primitives provided by standard parallel programming APIs are the primary means by which applications implement global synchronisation. Typically these primitives are fully-committed to synchronisation in the sense that, once a barrier is entered, synchronisation is the only way out. For message-passing applications, this raises the question of what happens when a message arrives at a thread that already resides in a barrier. Without a satisfactory answer, barriers do not interact with message-passing in any useful way.In this paper, we propose a new refutable barrier primitive that combines with message-passing to form a simple, expressive, efficient, well-defined API. It has a clear semantics based on termination detection, and supports the development of both globally-synchronous and asynchronous parallel applications.To evaluate the new primitive, we implement it in a prototype large-scale message-passing machine with 49, 152 RISC-V threads distributed over 48 FPGAs. We show that hardware support for the primitive leads to a highly-efficient implementation, capable of synchronisation rates that are an order-of-magnitude higher than what is achievable in software. Using the primitive, we implement synchronous and asynchronous versions of a range of applications, observing that each version can have significant advantages over the other, depending on the application. Therefore, a barrier primitive supporting both styles can greatly assist the development of parallel programs.
This record has no associated files available for download.
More information
e-pub ahead of print date: 31 July 2020
Additional Information:
Funding Information:
VIII. ACKNOWLEDGMENTS Thanks to He Li and Mayhar Shahsavari. This work was supported by EPSRC grant EP/N031768/1 (POETS project).
Publisher Copyright:
© 2020 IEEE.
Venue - Dates:
31st IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2020, , Manchester, United Kingdom, 2020-07-06 - 2020-07-08
Identifiers
Local EPrints ID: 470108
URI: http://eprints.soton.ac.uk/id/eprint/470108
ISSN: 1063-6862
PURE UUID: 34eb21ef-348d-4e29-8bc7-9ed315caa62b
Catalogue record
Date deposited: 03 Oct 2022 16:52
Last modified: 11 May 2024 02:06
Export record
Altmetrics
Contributors
Author:
Matthew Naylor
Author:
Simon W. Moore
Author:
Andrey Mokhov
Author:
David Thomas
Author:
Jonathan R. Beaumont
Author:
Shane Fleming
Author:
A. Theodore Markettos
Author:
Thomas Bytheway
Author:
Andrew Brown
Editor:
Frank Hannig
Editor:
Javier Navaridas
Editor:
Dirk Koch
Editor:
Ameer Abdelhadi
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics