The University of Southampton
University of Southampton Institutional Repository

An NLP-driven framework for Business Email Compromise detection and authorship verification

An NLP-driven framework for Business Email Compromise detection and authorship verification
An NLP-driven framework for Business Email Compromise detection and authorship verification
Business Email Compromise (BEC) represents a significant cybersecurity threat that exploits linguistic impersonation and social engineering, rather than relying on traditional malware or malicious attachments. These attacks often bypass conventional detection systems by mimicking the language, tone, and identity of trusted individuals within an organization.

This thesis investigates content-based approaches to BEC detection using a suite of natural language processing (NLP) models. It first introduces a transformer-based classifier to identify semantic indicators of deception within email body text. It then presents a Siamese authorship verification (AV) model designed to detect stylistic inconsistencies, even under adversarial mimicry. These components are integrated into a unified multi-task learning (MTL) framework that jointly optimizes for BEC detection and AV, leveraging shared representations while preserving task-specific objectives.

To support empirical evaluation, the thesis proposes a structured taxonomy of BEC fraud and constructs a synthetic dataset through prompt-engineered language model fine-tuning and human validation. Experiments conducted on a combination of real and synthetic emails demonstrate that the MTL framework achieves up to 97% F1-score for BEC detection and 93% for AV, outperforming transfer learning baselines while reducing false positives and computational cost.

This work contributes a principled, modular, and extensible framework for enhancing email security through joint semantic and stylistic analysis, addressing critical gaps in current defenses against sophisticated impersonation-based attacks.
University of Southampton
Almutairi, Amirah
93ab82cb-5649-45b5-b6a7-a1ce15446354
Almutairi, Amirah
93ab82cb-5649-45b5-b6a7-a1ce15446354
Al Hashimy, Nawfal
e73b96f2-bf15-40cb-9af5-23c10ea8e319
Kang, Boojoong
cfccdccd-f57f-448e-9f3c-1c51134c48dd

Almutairi, Amirah (2025) An NLP-driven framework for Business Email Compromise detection and authorship verification. University of Southampton, Doctoral Thesis, 127pp.

Record type: Thesis (Doctoral)

Abstract

Business Email Compromise (BEC) represents a significant cybersecurity threat that exploits linguistic impersonation and social engineering, rather than relying on traditional malware or malicious attachments. These attacks often bypass conventional detection systems by mimicking the language, tone, and identity of trusted individuals within an organization.

This thesis investigates content-based approaches to BEC detection using a suite of natural language processing (NLP) models. It first introduces a transformer-based classifier to identify semantic indicators of deception within email body text. It then presents a Siamese authorship verification (AV) model designed to detect stylistic inconsistencies, even under adversarial mimicry. These components are integrated into a unified multi-task learning (MTL) framework that jointly optimizes for BEC detection and AV, leveraging shared representations while preserving task-specific objectives.

To support empirical evaluation, the thesis proposes a structured taxonomy of BEC fraud and constructs a synthetic dataset through prompt-engineered language model fine-tuning and human validation. Experiments conducted on a combination of real and synthetic emails demonstrate that the MTL framework achieves up to 97% F1-score for BEC detection and 93% for AV, outperforming transfer learning baselines while reducing false positives and computational cost.

This work contributes a principled, modular, and extensible framework for enhancing email security through joint semantic and stylistic analysis, addressing critical gaps in current defenses against sophisticated impersonation-based attacks.

Text
Almutairi_PhD_Thesis_2025_PDF-A3b
Download (3MB)
Text
Final-thesis-submission-Examination-Ms-Amirah-Almutairi
Restricted to Repository staff only

More information

Published date: 2025

Identifiers

Local EPrints ID: 505289
URI: http://eprints.soton.ac.uk/id/eprint/505289
PURE UUID: 0db58b7a-3f88-4760-92f8-575286f61c1f
ORCID for Amirah Almutairi: ORCID iD orcid.org/0000-0002-2194-7936
ORCID for Nawfal Al Hashimy: ORCID iD orcid.org/0000-0002-1129-5217
ORCID for Boojoong Kang: ORCID iD orcid.org/0000-0001-5984-9867

Catalogue record

Date deposited: 06 Oct 2025 16:43
Last modified: 07 Oct 2025 02:03

Export record

Contributors

Author: Amirah Almutairi ORCID iD
Thesis advisor: Nawfal Al Hashimy ORCID iD
Thesis advisor: Boojoong Kang ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×