Synchronization in graph analysis algorithms on the Partially Ordered Event-Triggered Systems many-core architecture
Synchronization in graph analysis algorithms on the Partially Ordered Event-Triggered Systems many-core architecture
One of the key problems in designing and implementing graph analysis algorithms for distributed platforms is to find an optimal way of managing communication flows in the massively parallel processing network. Message-passing and global synchronization are powerful abstractions in this regard, especially when used in combination. This paper studies the use of a hardware-implemented refutable global barrier as a design optimization technique aimed at unifying these abstractions at the API level. The paper explores the trade-offs between the related overheads and performance factors on a message-passing prototype machine with 49,152 RISC-V threads distributed over 48 FPGAs (called the Partially Ordered Event-Triggered Systems platform). Our experiments show that some graph applications favour synchronized communication, but the effect is hard to predict in general because of the interplay between multiple hardware and software factors. A classifier model is therefore proposed and implemented to perform such a prediction based on the application graph topology parameters: graph diameter, degree of connectivity, and reconvergence metric. The presented experimental results demonstrate that the correct choice of communication mode, granted by the new model-driven approach, helps to achieve 3.22 times faster computation time on average compared to the baseline platform operation.
71-88
Rafiev, Ashur
b84a52d1-1b83-42a8-b65a-acc15d293ca9
Yakovlev, Alex
d6c94911-c126-4cb7-8f92-d71a898ebbb2
Tarawneh, Ghaith
1b90fbe9-1337-4216-83ba-b01115bf4000
Naylor, Matthew F.
6c0f1008-4db4-4c09-8461-b2355bf25275
Moore, Simon W.
e9f2be21-1fa3-43aa-a3e2-fc8519f97a00
Thomas, David B.
5701997d-7de3-4e57-a802-ea2bd3e6ab6c
Bragg, Graeme M.
b5fd19b9-1a51-470b-a226-2d4dd5ff447a
Vousden, Mark L.
72f20dc7-d350-4982-a680-2d1f9ed5f07f
Brown, Andrew D.
5c19e523-65ec-499b-9e7c-91522017d7e0
May 2022
Rafiev, Ashur
b84a52d1-1b83-42a8-b65a-acc15d293ca9
Yakovlev, Alex
d6c94911-c126-4cb7-8f92-d71a898ebbb2
Tarawneh, Ghaith
1b90fbe9-1337-4216-83ba-b01115bf4000
Naylor, Matthew F.
6c0f1008-4db4-4c09-8461-b2355bf25275
Moore, Simon W.
e9f2be21-1fa3-43aa-a3e2-fc8519f97a00
Thomas, David B.
5701997d-7de3-4e57-a802-ea2bd3e6ab6c
Bragg, Graeme M.
b5fd19b9-1a51-470b-a226-2d4dd5ff447a
Vousden, Mark L.
72f20dc7-d350-4982-a680-2d1f9ed5f07f
Brown, Andrew D.
5c19e523-65ec-499b-9e7c-91522017d7e0
Rafiev, Ashur, Yakovlev, Alex, Tarawneh, Ghaith, Naylor, Matthew F., Moore, Simon W., Thomas, David B., Bragg, Graeme M., Vousden, Mark L. and Brown, Andrew D.
(2022)
Synchronization in graph analysis algorithms on the Partially Ordered Event-Triggered Systems many-core architecture.
IET Computers and Digital Techniques, 16 (2-3), .
(doi:10.1049/cdt2.12041).
Abstract
One of the key problems in designing and implementing graph analysis algorithms for distributed platforms is to find an optimal way of managing communication flows in the massively parallel processing network. Message-passing and global synchronization are powerful abstractions in this regard, especially when used in combination. This paper studies the use of a hardware-implemented refutable global barrier as a design optimization technique aimed at unifying these abstractions at the API level. The paper explores the trade-offs between the related overheads and performance factors on a message-passing prototype machine with 49,152 RISC-V threads distributed over 48 FPGAs (called the Partially Ordered Event-Triggered Systems platform). Our experiments show that some graph applications favour synchronized communication, but the effect is hard to predict in general because of the interplay between multiple hardware and software factors. A classifier model is therefore proposed and implemented to perform such a prediction based on the application graph topology parameters: graph diameter, degree of connectivity, and reconvergence metric. The presented experimental results demonstrate that the correct choice of communication mode, granted by the new model-driven approach, helps to achieve 3.22 times faster computation time on average compared to the baseline platform operation.
Text
IET Computers Digital Tech - 2022 - Rafiev - Synchronization in graph analysis algorithms on the Partially Ordered
- Version of Record
More information
Accepted/In Press date: 21 March 2022
e-pub ahead of print date: 3 April 2022
Published date: May 2022
Additional Information:
Funding Information:
This work is supported by EPSRC/UK as a part of the POETS project EP/N031768/1.
Publisher Copyright:
© 2022 The Authors. IET Computers & Digital Techniques published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.
Identifiers
Local EPrints ID: 468767
URI: http://eprints.soton.ac.uk/id/eprint/468767
ISSN: 1751-8601
PURE UUID: e0aa54fe-6b76-4214-88d6-aa31ba5b4622
Catalogue record
Date deposited: 25 Aug 2022 16:36
Last modified: 11 May 2024 02:06
Export record
Altmetrics
Contributors
Author:
Ashur Rafiev
Author:
Alex Yakovlev
Author:
Ghaith Tarawneh
Author:
Matthew F. Naylor
Author:
Simon W. Moore
Author:
David B. Thomas
Author:
Graeme M. Bragg
Author:
Mark L. Vousden
Author:
Andrew D. Brown
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics