Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation
Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation
SBLI (Shock-wave/Boundary-layer Interaction) is a large-scale Computational Fluid Dynamics (CFD) application, developed over 20 years at the University of Southampton and extensively used within the UK Turbulence Consortium. It is capable of performing Direct Numerical Simulations (DNS) or Large Eddy Simulation (LES) of shock-wave/boundary-layer interaction problems over highly detailed multi-block structured mesh geometries. SBLI presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging massively parallel hardware platforms. In this paper we present research in achieving this goal through the OPS embedded domain-specific language. OPS targets the domain of multi-block structured mesh applications. It provides an API embedded in C/C++ and Fortran and makes use of automatic code generation and compilation to produce executables capable of running on a range of parallel hardware systems. The core functionality of SBLI is captured using a new framework called OpenSBLI which enables a developer to declare the partial differential equations using Einstein notation and then automatically carryout discretization and generation of OPS (C/C++) API code. OPS is then used to automatically generate a wide range of parallel implementations. Using this multi-layered abstractions approach we demonstrate how new opportunities for further optimizations can be gained, such as fine-tuning the computation intensity and reducing data movement and apply them automatically. Performance results demonstrate there is no performance loss due to the high-level development strategy with OPS and OpenSBLI, with performance matching or exceeding the hand-tuned original code on all CPU nodes tested. The data movement optimizations provide over 3 speedups on CPU nodes, while GPUs provide 5 speedups over the best performing CPU node. The OPS generated parallel code also demonstrates excellent scalability on nearly 100K cores on a Cray XC30 (ARCHER at EPCC) and on over 4K GPUs on a CrayXK7 (Titan at ORNL).
130-146
Mudalige, G.R.
842e79dd-2699-4250-be57-771a538707ef
Reguly, I.Z.
1c95ab4b-782d-44af-a92f-baf32ddeb916
Jammy, S.P.
5267fe44-6c22-473c-b9f0-8e1df884fada
Jacobs, C.T.
6404603a-3c2e-42d9-a0ca-01e4b76f6a58
Giles, M.B.
029bcad3-71bb-479c-a470-a86b3c4fe3e5
Sandham, N.D.
0024d8cd-c788-4811-a470-57934fbdcf97
September 2019
Mudalige, G.R.
842e79dd-2699-4250-be57-771a538707ef
Reguly, I.Z.
1c95ab4b-782d-44af-a92f-baf32ddeb916
Jammy, S.P.
5267fe44-6c22-473c-b9f0-8e1df884fada
Jacobs, C.T.
6404603a-3c2e-42d9-a0ca-01e4b76f6a58
Giles, M.B.
029bcad3-71bb-479c-a470-a86b3c4fe3e5
Sandham, N.D.
0024d8cd-c788-4811-a470-57934fbdcf97
Mudalige, G.R., Reguly, I.Z., Jammy, S.P., Jacobs, C.T., Giles, M.B. and Sandham, N.D.
(2019)
Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation.
Journal of Parallel and Distributed Computing, 131, .
(doi:10.1016/j.jpdc.2019.04.019).
Abstract
SBLI (Shock-wave/Boundary-layer Interaction) is a large-scale Computational Fluid Dynamics (CFD) application, developed over 20 years at the University of Southampton and extensively used within the UK Turbulence Consortium. It is capable of performing Direct Numerical Simulations (DNS) or Large Eddy Simulation (LES) of shock-wave/boundary-layer interaction problems over highly detailed multi-block structured mesh geometries. SBLI presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging massively parallel hardware platforms. In this paper we present research in achieving this goal through the OPS embedded domain-specific language. OPS targets the domain of multi-block structured mesh applications. It provides an API embedded in C/C++ and Fortran and makes use of automatic code generation and compilation to produce executables capable of running on a range of parallel hardware systems. The core functionality of SBLI is captured using a new framework called OpenSBLI which enables a developer to declare the partial differential equations using Einstein notation and then automatically carryout discretization and generation of OPS (C/C++) API code. OPS is then used to automatically generate a wide range of parallel implementations. Using this multi-layered abstractions approach we demonstrate how new opportunities for further optimizations can be gained, such as fine-tuning the computation intensity and reducing data movement and apply them automatically. Performance results demonstrate there is no performance loss due to the high-level development strategy with OPS and OpenSBLI, with performance matching or exceeding the hand-tuned original code on all CPU nodes tested. The data movement optimizations provide over 3 speedups on CPU nodes, while GPUs provide 5 speedups over the best performing CPU node. The OPS generated parallel code also demonstrates excellent scalability on nearly 100K cores on a Cray XC30 (ARCHER at EPCC) and on over 4K GPUs on a CrayXK7 (Titan at ORNL).
Text
Large-scale Performance of a DSL-based MudaligeJPDS2019
- Accepted Manuscript
More information
Accepted/In Press date: 21 April 2019
e-pub ahead of print date: 6 May 2019
Published date: September 2019
Identifiers
Local EPrints ID: 431598
URI: http://eprints.soton.ac.uk/id/eprint/431598
ISSN: 0743-7315
PURE UUID: f3594194-372c-4ce3-9171-3ddc51a867af
Catalogue record
Date deposited: 10 Jun 2019 16:30
Last modified: 16 Mar 2024 07:49
Export record
Altmetrics
Contributors
Author:
G.R. Mudalige
Author:
I.Z. Reguly
Author:
S.P. Jammy
Author:
C.T. Jacobs
Author:
M.B. Giles
Author:
N.D. Sandham
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics