A Suitability Score to optimize CNNs on an FPGA accelerator
A Suitability Score to optimize CNNs on an FPGA accelerator
This thesis presents a structured optimisation methodology for deploying convolutional neural networks (CNNs) on field-programmable gate arrays (FPGAs), targeting high-throughput operation under constraints of computational resources and latency. The proposed approach integrates model-level restructuring, hardware-aware scheduling, and hardware–software co-design and deployment on FPGAs to deliver high-throughput performance while preserving CNN model accuracy. Oesophageal
cancer detection is used as a representative case study, providing a computationally intensive and accuracy-critical scenario for evaluating the proposed methods.
The proposed methodology introduces the Suitability Score, a metric identifying which convolutional layers benefit most from hardware-aware optimisation. This analysis enables selective adjustments that reduce computational cost without sacrificing model accuracy. Based on these insights, a layer-specific pipelining strategy improves the hardware resource efficiency and inference latency of the deployed CNN accelerator. The optimised model is deployed on an FPGA using a co-design framework, demonstrating high throughput and competitive accuracy while consuming fewer hardware resources than FPGA-based CNN accelerators reported in the literature.
The proposed accelerator is deployed on an AMD Kintex UltraScale+ FPGA and evaluated against graphics processing units (GPU)-based inference and existing FPGA implementations. Compared to a GPU baseline, the accelerator achieves at least 47.6% higher throughput and more than twice the energy efficiency. In FPGA-based comparisons, it processes up to 7.8× more images per second while using fewer hardware resources. Moreover, the results demonstrate that the proposed accelerator achieves a throughput of 76.19 images/s with 97.45% accuracy, while maintaining low resource and power consumption. These results demonstrate that the proposed FPGA-based approach supports real-time CNN inference with high accuracy, high throughput, and efficient hardware usage, making it suitable for broader use in embedded, latency-sensitive image analysis applications.
University of Southampton
Saglam, Serkan
7507c7d4-d9ca-46e9-b29b-4dc7509e9280
2025
Saglam, Serkan
7507c7d4-d9ca-46e9-b29b-4dc7509e9280
Zwolinski, Mark
adfcb8e7-877f-4bd7-9b55-7553b6cb3ea0
Ramchurn, Gopal
1d62ae2a-a498-444e-912d-a6082d3aaea3
Underwood, Tim
8e81bf60-edd2-4b0e-8324-3068c95ea1c6
Saglam, Serkan
(2025)
A Suitability Score to optimize CNNs on an FPGA accelerator.
University of Southampton, Doctoral Thesis, 143pp.
Record type:
Thesis
(Doctoral)
Abstract
This thesis presents a structured optimisation methodology for deploying convolutional neural networks (CNNs) on field-programmable gate arrays (FPGAs), targeting high-throughput operation under constraints of computational resources and latency. The proposed approach integrates model-level restructuring, hardware-aware scheduling, and hardware–software co-design and deployment on FPGAs to deliver high-throughput performance while preserving CNN model accuracy. Oesophageal
cancer detection is used as a representative case study, providing a computationally intensive and accuracy-critical scenario for evaluating the proposed methods.
The proposed methodology introduces the Suitability Score, a metric identifying which convolutional layers benefit most from hardware-aware optimisation. This analysis enables selective adjustments that reduce computational cost without sacrificing model accuracy. Based on these insights, a layer-specific pipelining strategy improves the hardware resource efficiency and inference latency of the deployed CNN accelerator. The optimised model is deployed on an FPGA using a co-design framework, demonstrating high throughput and competitive accuracy while consuming fewer hardware resources than FPGA-based CNN accelerators reported in the literature.
The proposed accelerator is deployed on an AMD Kintex UltraScale+ FPGA and evaluated against graphics processing units (GPU)-based inference and existing FPGA implementations. Compared to a GPU baseline, the accelerator achieves at least 47.6% higher throughput and more than twice the energy efficiency. In FPGA-based comparisons, it processes up to 7.8× more images per second while using fewer hardware resources. Moreover, the results demonstrate that the proposed accelerator achieves a throughput of 76.19 images/s with 97.45% accuracy, while maintaining low resource and power consumption. These results demonstrate that the proposed FPGA-based approach supports real-time CNN inference with high accuracy, high throughput, and efficient hardware usage, making it suitable for broader use in embedded, latency-sensitive image analysis applications.
Text
Thesis_of_Doctor_of_Philosophy_Serkan_Saglam__Final
Text
Final-thesis-submission-Examination-MR-Serkan-Saglam
Restricted to Repository staff only
More information
Published date: 2025
Identifiers
Local EPrints ID: 505575
URI: http://eprints.soton.ac.uk/id/eprint/505575
PURE UUID: c7d80f63-d21b-4ce2-a351-cf3d4f1ab4c5
Catalogue record
Date deposited: 14 Oct 2025 16:41
Last modified: 18 Oct 2025 01:41
Export record
Contributors
Author:
Serkan Saglam
Thesis advisor:
Mark Zwolinski
Thesis advisor:
Gopal Ramchurn
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics