The University of Southampton
University of Southampton Institutional Repository

Implementation of a fully-parallel turbo decoder on a general-purpose graphics processing unit

Implementation of a fully-parallel turbo decoder on a general-purpose graphics processing unit
Implementation of a fully-parallel turbo decoder on a general-purpose graphics processing unit
Turbo codes comprising a parallel concatenation of upper and lower convolutional codes are widely employed in state-of-the-art wireless communication standards, since they facilitate transmission throughputs that closely approach the channel capacity. However, this necessitates high processing throughputs in order for the turbo code to support real-time communications. In stateof- the-art turbo code implementations, the processing throughput is typically limited by the data dependencies that occur within the forward and backward recursions of the Log-BCJR algorithm, which is employed during turbo decoding. In contrast to the highly-serial Log-BCJR turbo decoder, we have recently proposed a novel Fully Parallel Turbo Decoder (FPTD) algorithm, which can eliminate the data dependencies and perform fully parallel processing. In this paper, we propose an optimized FPTD algorithm, which reformulates the operation of the FPTD algorithm so that the upper and lower decoders have identical operation, in order to support Single Instruction Multiple Data (SIMD) operation. This allows us to develop a novel General Purpose Graphics Processing Unit (GPGPU) implementation of the FPTD, which has application in Software-Defined Radios (SDRs) and virtualized Cloud- Radio Access Networks (C-RANs). As a benefit of its higher degree of parallelism, we show that our FPTD improves the higher processing throughput of the Log-BCJR turbo decoder by between 2.3 and 9.2 times, when employing a high-specification GPGPU. However, this is achieved at the cost of a moderate increase of the overall complexity by between 1.7 and 3.3 times.
could radio access network, fully-parallel turbo decoder, parallel processing, GPGPU computing, software defined radio
5624-5639
Li, An
099fae06-fd69-4cab-933c-43a9b94ce1f1
Maunder, Robert G.
76099323-7d58-4732-a98f-22a662ccba6c
Al-Hashimi, Bashir
0b29c671-a6d2-459c-af68-c4614dce3b5d
Hanzo, Lajos
66e7266f-3066-4fc0-8391-e000acce71a1
Li, An
099fae06-fd69-4cab-933c-43a9b94ce1f1
Maunder, Robert G.
76099323-7d58-4732-a98f-22a662ccba6c
Al-Hashimi, Bashir
0b29c671-a6d2-459c-af68-c4614dce3b5d
Hanzo, Lajos
66e7266f-3066-4fc0-8391-e000acce71a1

Li, An, Maunder, Robert G., Al-Hashimi, Bashir and Hanzo, Lajos (2016) Implementation of a fully-parallel turbo decoder on a general-purpose graphics processing unit. IEEE Access, 4, 5624-5639. (doi:10.1109/ACCESS.2016.2586309).

Record type: Article

Abstract

Turbo codes comprising a parallel concatenation of upper and lower convolutional codes are widely employed in state-of-the-art wireless communication standards, since they facilitate transmission throughputs that closely approach the channel capacity. However, this necessitates high processing throughputs in order for the turbo code to support real-time communications. In stateof- the-art turbo code implementations, the processing throughput is typically limited by the data dependencies that occur within the forward and backward recursions of the Log-BCJR algorithm, which is employed during turbo decoding. In contrast to the highly-serial Log-BCJR turbo decoder, we have recently proposed a novel Fully Parallel Turbo Decoder (FPTD) algorithm, which can eliminate the data dependencies and perform fully parallel processing. In this paper, we propose an optimized FPTD algorithm, which reformulates the operation of the FPTD algorithm so that the upper and lower decoders have identical operation, in order to support Single Instruction Multiple Data (SIMD) operation. This allows us to develop a novel General Purpose Graphics Processing Unit (GPGPU) implementation of the FPTD, which has application in Software-Defined Radios (SDRs) and virtualized Cloud- Radio Access Networks (C-RANs). As a benefit of its higher degree of parallelism, we show that our FPTD improves the higher processing throughput of the Log-BCJR turbo decoder by between 2.3 and 9.2 times, when employing a high-specification GPGPU. However, this is achieved at the cost of a moderate increase of the overall complexity by between 1.7 and 3.3 times.

Text
FPTD_GPU_IEEEaccess.pdf - Accepted Manuscript
Available under License Creative Commons Attribution.
Download (822kB)
Text
07501831.pdf - Accepted Manuscript
Available under License Creative Commons Attribution.
Download (683kB)

More information

Accepted/In Press date: 28 June 2016
Published date: 29 June 2016
Keywords: could radio access network, fully-parallel turbo decoder, parallel processing, GPGPU computing, software defined radio
Organisations: Southampton Wireless Group

Identifiers

Local EPrints ID: 397525
URI: http://eprints.soton.ac.uk/id/eprint/397525
PURE UUID: 69af5b24-579d-4f5d-8c25-23eaf8f1cbbc
ORCID for Robert G. Maunder: ORCID iD orcid.org/0000-0002-7944-2615
ORCID for Lajos Hanzo: ORCID iD orcid.org/0000-0002-2636-5214

Catalogue record

Date deposited: 29 Jun 2016 13:48
Last modified: 18 Mar 2024 03:09

Export record

Altmetrics

Contributors

Author: An Li
Author: Robert G. Maunder ORCID iD
Author: Bashir Al-Hashimi
Author: Lajos Hanzo ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×