An NLP-driven framework for Business Email Compromise detection and authorship verification
An NLP-driven framework for Business Email Compromise detection and authorship verification
Business Email Compromise (BEC) represents a significant cybersecurity threat that exploits linguistic impersonation and social engineering, rather than relying on traditional malware or malicious attachments. These attacks often bypass conventional detection systems by mimicking the language, tone, and identity of trusted individuals within an organization.
This thesis investigates content-based approaches to BEC detection using a suite of natural language processing (NLP) models. It first introduces a transformer-based classifier to identify semantic indicators of deception within email body text. It then presents a Siamese authorship verification (AV) model designed to detect stylistic inconsistencies, even under adversarial mimicry. These components are integrated into a unified multi-task learning (MTL) framework that jointly optimizes for BEC detection and AV, leveraging shared representations while preserving task-specific objectives.
To support empirical evaluation, the thesis proposes a structured taxonomy of BEC fraud and constructs a synthetic dataset through prompt-engineered language model fine-tuning and human validation. Experiments conducted on a combination of real and synthetic emails demonstrate that the MTL framework achieves up to 97% F1-score for BEC detection and 93% for AV, outperforming transfer learning baselines while reducing false positives and computational cost.
This work contributes a principled, modular, and extensible framework for enhancing email security through joint semantic and stylistic analysis, addressing critical gaps in current defenses against sophisticated impersonation-based attacks.
University of Southampton
Almutairi, Amirah
93ab82cb-5649-45b5-b6a7-a1ce15446354
2025
Almutairi, Amirah
93ab82cb-5649-45b5-b6a7-a1ce15446354
Al Hashimy, Nawfal
e73b96f2-bf15-40cb-9af5-23c10ea8e319
Kang, Boojoong
cfccdccd-f57f-448e-9f3c-1c51134c48dd
Almutairi, Amirah
(2025)
An NLP-driven framework for Business Email Compromise detection and authorship verification.
University of Southampton, Doctoral Thesis, 127pp.
Record type:
Thesis
(Doctoral)
Abstract
Business Email Compromise (BEC) represents a significant cybersecurity threat that exploits linguistic impersonation and social engineering, rather than relying on traditional malware or malicious attachments. These attacks often bypass conventional detection systems by mimicking the language, tone, and identity of trusted individuals within an organization.
This thesis investigates content-based approaches to BEC detection using a suite of natural language processing (NLP) models. It first introduces a transformer-based classifier to identify semantic indicators of deception within email body text. It then presents a Siamese authorship verification (AV) model designed to detect stylistic inconsistencies, even under adversarial mimicry. These components are integrated into a unified multi-task learning (MTL) framework that jointly optimizes for BEC detection and AV, leveraging shared representations while preserving task-specific objectives.
To support empirical evaluation, the thesis proposes a structured taxonomy of BEC fraud and constructs a synthetic dataset through prompt-engineered language model fine-tuning and human validation. Experiments conducted on a combination of real and synthetic emails demonstrate that the MTL framework achieves up to 97% F1-score for BEC detection and 93% for AV, outperforming transfer learning baselines while reducing false positives and computational cost.
This work contributes a principled, modular, and extensible framework for enhancing email security through joint semantic and stylistic analysis, addressing critical gaps in current defenses against sophisticated impersonation-based attacks.
Text
Almutairi_PhD_Thesis_2025_PDF-A3b
Text
Final-thesis-submission-Examination-Ms-Amirah-Almutairi
Restricted to Repository staff only
More information
Published date: 2025
Identifiers
Local EPrints ID: 505289
URI: http://eprints.soton.ac.uk/id/eprint/505289
PURE UUID: 0db58b7a-3f88-4760-92f8-575286f61c1f
Catalogue record
Date deposited: 06 Oct 2025 16:43
Last modified: 07 Oct 2025 02:03
Export record
Contributors
Author:
Amirah Almutairi
Thesis advisor:
Nawfal Al Hashimy
Thesis advisor:
Boojoong Kang
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics