VLSI Implementation of Fully Parallel LTE Turbo Decoders
VLSI Implementation of Fully Parallel LTE Turbo Decoders
Turbo codes facilitate near-capacity transmission throughputs by achieving a reliable iterative forward error correction. However, owing to the serial data dependence imposed by the logarithmic Bahl–Cocke–Jelinek–Raviv algorithm, the limited processing throughputs of the conventional turbo decoder implementations impose a severe bottleneck upon the overall throughputs of real-time communication schemes. Motivated by this, we recently proposed a floating-point fully parallel turbo decoder (FPTD) algorithm, which eliminates the serial data dependence, allowing parallel processing and hence significantly reducing the number of clock cycles required. In this paper, we conceive a technique for reducing the critical datapath of the FPTD, and we propose a novel fixed-point version as well as its very large scale integration (VLSI) implementation. We also propose a novel technique, which allows the FPTD to also decode shorter frames employing compatible interleaver patterns. We strike beneficial tradeoffs amongst the latency, core area, and energy consumption by investigating the minimum bit widths and techniques for message log-likelihood ratio scaling and state metric normalization. Accordingly, the design flow and design tradeoffs considered in this paper are also applicable to other fixed-point implementations of error correction decoders. We demonstrate that upon using Taiwan Semiconductor Manufacturing Company (TSMC) 65-nm low-power technology for decoding the longest long-term evolution frames (6144 b) received over an additive white Gaussian noise channel having Eb/N0 = 1 dB, the proposed fixed-point FPTD VLSI achieves a processing throughput of 21.9 Gb/s and a processing latency of 0.28 ?s. These results are 17.1 times superior to those of the state-of-the-art benchmarker. Furthermore, the proposed fixed-point FPTD VLSI achieves an energy consumption of 2.69 ?J/frame and a normalized core area of 5 mm2/Gb/s, which are 34% and 23% lower than those of the benchmarker, respectively.
323-346
Li, An
099fae06-fd69-4cab-933c-43a9b94ce1f1
Xiang, Luping
ee0a15fb-5774-4004-b236-e301d323d786
Chen, Taihai
ba25efc9-bf08-47ee-965f-f74be7b9cc50
Maunder, Robert G.
76099323-7d58-4732-a98f-22a662ccba6c
Al-Hashimi, Bashir M.
0b29c671-a6d2-459c-af68-c4614dce3b5d
Hanzo, Lajos
66e7266f-3066-4fc0-8391-e000acce71a1
Li, An
099fae06-fd69-4cab-933c-43a9b94ce1f1
Xiang, Luping
ee0a15fb-5774-4004-b236-e301d323d786
Chen, Taihai
ba25efc9-bf08-47ee-965f-f74be7b9cc50
Maunder, Robert G.
76099323-7d58-4732-a98f-22a662ccba6c
Al-Hashimi, Bashir M.
0b29c671-a6d2-459c-af68-c4614dce3b5d
Hanzo, Lajos
66e7266f-3066-4fc0-8391-e000acce71a1
Li, An, Xiang, Luping, Chen, Taihai, Maunder, Robert G., Al-Hashimi, Bashir M. and Hanzo, Lajos
(2016)
VLSI Implementation of Fully Parallel LTE Turbo Decoders.
IEEE Access, 4, .
(doi:10.1109/ACCESS.2016.2515719).
Abstract
Turbo codes facilitate near-capacity transmission throughputs by achieving a reliable iterative forward error correction. However, owing to the serial data dependence imposed by the logarithmic Bahl–Cocke–Jelinek–Raviv algorithm, the limited processing throughputs of the conventional turbo decoder implementations impose a severe bottleneck upon the overall throughputs of real-time communication schemes. Motivated by this, we recently proposed a floating-point fully parallel turbo decoder (FPTD) algorithm, which eliminates the serial data dependence, allowing parallel processing and hence significantly reducing the number of clock cycles required. In this paper, we conceive a technique for reducing the critical datapath of the FPTD, and we propose a novel fixed-point version as well as its very large scale integration (VLSI) implementation. We also propose a novel technique, which allows the FPTD to also decode shorter frames employing compatible interleaver patterns. We strike beneficial tradeoffs amongst the latency, core area, and energy consumption by investigating the minimum bit widths and techniques for message log-likelihood ratio scaling and state metric normalization. Accordingly, the design flow and design tradeoffs considered in this paper are also applicable to other fixed-point implementations of error correction decoders. We demonstrate that upon using Taiwan Semiconductor Manufacturing Company (TSMC) 65-nm low-power technology for decoding the longest long-term evolution frames (6144 b) received over an additive white Gaussian noise channel having Eb/N0 = 1 dB, the proposed fixed-point FPTD VLSI achieves a processing throughput of 21.9 Gb/s and a processing latency of 0.28 ?s. These results are 17.1 times superior to those of the state-of-the-art benchmarker. Furthermore, the proposed fixed-point FPTD VLSI achieves an energy consumption of 2.69 ?J/frame and a normalized core area of 5 mm2/Gb/s, which are 34% and 23% lower than those of the benchmarker, respectively.
Text
access-hanzo-2515719-proof.pdf
- Other
Available under License Other.
More information
Accepted/In Press date: 5 January 2016
e-pub ahead of print date: 11 January 2016
Organisations:
Electronic & Software Systems, Southampton Wireless Group
Identifiers
Local EPrints ID: 386016
URI: http://eprints.soton.ac.uk/id/eprint/386016
PURE UUID: 72446e6a-9fe7-4ce7-bd51-0a17b5e5f902
Catalogue record
Date deposited: 16 Jan 2016 03:37
Last modified: 03 Sep 2024 01:43
Export record
Altmetrics
Contributors
Author:
An Li
Author:
Luping Xiang
Author:
Taihai Chen
Author:
Robert G. Maunder
Author:
Bashir M. Al-Hashimi
Author:
Lajos Hanzo
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics