The University of Southampton
University of Southampton Institutional Repository

Reliable computation with unreliable computers

Reliable computation with unreliable computers
Reliable computation with unreliable computers
As computing systems continue their unquenchable rise towards and through million core architectures, two considerations that used to be unimportant become more and more dominant: power consumption (be it FLOPS/W or W/mm2) and reliability. This study is concerned with the latter: in a system of a million cores, it is unrealistic to expect 100% functionality on power-up; equally, operational availability degrades with time. Monitoring and maintaining the health of such a system using traditional techniques is costly, and most rely on the concept of some sort of central overseer or monitor to make a final judgement about system availability, giving a single point of failure. Large systems of the future will consist of hardware and software that work synergistically to cope with isolated points of failure, allowing the gross behaviour of the system to degrade gracefully and in a meaningful way in the face of faults. This study describes one such system: spiking neural network architecture is a million-core machine with layered fault tolerance built in at many levels. The authors show how the system may be used to solve the canonical distributed heat diffusion equation, and how the quality of solution is modulated by the effects of partial system failure.
1-8
Brown, A.D.
5c19e523-65ec-499b-9e7c-91522017d7e0
Mills, Rob
3d53d4bc-e1de-4807-b89b-f5813f2172a7
Dugan, Kier J.
8e29c558-b709-466f-96dc-1a26f8f91362
Reeve, Jeff
dd909010-7d44-44ea-83fe-a09e4d492618
Furber, Steve
5060db9f-746b-4af3-b8e0-53c8b5c9f4a0
Brown, A.D.
5c19e523-65ec-499b-9e7c-91522017d7e0
Mills, Rob
3d53d4bc-e1de-4807-b89b-f5813f2172a7
Dugan, Kier J.
8e29c558-b709-466f-96dc-1a26f8f91362
Reeve, Jeff
dd909010-7d44-44ea-83fe-a09e4d492618
Furber, Steve
5060db9f-746b-4af3-b8e0-53c8b5c9f4a0

Brown, A.D., Mills, Rob, Dugan, Kier J., Reeve, Jeff and Furber, Steve (2015) Reliable computation with unreliable computers. IET Computers & Digital Techniques, 1-8. (doi:10.1049/iet-cdt.2014.0110).

Record type: Article

Abstract

As computing systems continue their unquenchable rise towards and through million core architectures, two considerations that used to be unimportant become more and more dominant: power consumption (be it FLOPS/W or W/mm2) and reliability. This study is concerned with the latter: in a system of a million cores, it is unrealistic to expect 100% functionality on power-up; equally, operational availability degrades with time. Monitoring and maintaining the health of such a system using traditional techniques is costly, and most rely on the concept of some sort of central overseer or monitor to make a final judgement about system availability, giving a single point of failure. Large systems of the future will consist of hardware and software that work synergistically to cope with isolated points of failure, allowing the gross behaviour of the system to degrade gracefully and in a meaningful way in the face of faults. This study describes one such system: spiking neural network architecture is a million-core machine with layered fault tolerance built in at many levels. The authors show how the system may be used to solve the canonical distributed heat diffusion equation, and how the quality of solution is modulated by the effects of partial system failure.

Text
CDT-SI-2014-0110.pdf - Version of Record
Restricted to Repository staff only
Request a copy

More information

Accepted/In Press date: 9 October 2014
e-pub ahead of print date: 27 April 2015
Organisations: Faculty of Physical Sciences and Engineering

Identifiers

Local EPrints ID: 375932
URI: http://eprints.soton.ac.uk/id/eprint/375932
PURE UUID: b360823a-e53a-4f77-b749-0f23234bf582

Catalogue record

Date deposited: 20 Apr 2015 13:22
Last modified: 14 Mar 2024 19:35

Export record

Altmetrics

Contributors

Author: A.D. Brown
Author: Rob Mills
Author: Kier J. Dugan
Author: Jeff Reeve
Author: Steve Furber

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×