NLP verification: towards a general methodology for certifying robustness

Machine learning has exhibited substantial success in the field of natural language processing (NLP). For example, large language models have empirically proven to be capable of producing text of high complexity and cohesion. However, at the same time, they are prone to inaccuracies and hallucinations. As these systems are increasingly integrated into real-world applications, ensuring their safety and reliability becomes a primary concern. There are safety critical contexts where such models must be robust to variability or attack and give guarantees over their output. Computer vision had pioneered the use of formal verification of neural networks for such scenarios and developed common verification standards and pipelines, leveraging precise formal reasoning about geometric properties of data manifolds. In contrast, NLP verification methods have only recently appeared in the literature. While presenting sophisticated algorithms in their own right, these papers have not yet crystallised into a common methodology. They are often light on the pragmatical issues of NLP verification, and the area remains fragmented. In this paper, we attempt to distil and evaluate general components of an NLP verification pipeline that emerges from the progress in the field to date. Our contributions are twofold. First, we propose a general methodology to analyse the effect of the embedding gap - a problem that refers to the discrepancy between verification of geometric subspaces, and the semantic meaning of sentences which the geometric subspaces are supposed to represent. We propose a number of practical NLP methods that can help to quantify the effects of the embedding gap. Second, we give a general method for training and verification of neural networks that leverages a more precise geometric estimation of semantic similarity of sentences in the embedding space and helps to overcome the effects of the embedding gap in practice.

adversarial training, machine learning, natural language processing, Neural networks, robustness, verification

10.1017/S0956792525000099

0956-7925

Casadio, Marco

f32f79ab-7e18-4ed0-bc17-8988a2b7786c

Dinkar, Tanvi

a54bfc2a-2b2a-485a-8639-6bb7185a93ac

Komendantskaya, Ekaterina

f12d9c23-5589-40b8-bcf9-a04fe9dedf61

Arnaboldi, Luca

b7ba4883-52bd-4950-b63d-2e993c951b5b

Daggitt, Matthew L.

7788a0b1-f07e-4b37-b34a-77b7d6ad4005

Isac, Omri

c7b07a87-a81f-422f-b367-b256879b5a46

Katz, Guy

0d2bbdb4-3a24-482d-822d-bf8336f92500

Rieser, Verena

46a9e502-2839-46c8-bfd9-096ad5f4f3a8

Lemon, Oliver

806a87ea-f2bd-44c2-acfa-52bf0ea07784

2 April 2025

Casadio, Marco

f32f79ab-7e18-4ed0-bc17-8988a2b7786c

Dinkar, Tanvi

a54bfc2a-2b2a-485a-8639-6bb7185a93ac

Komendantskaya, Ekaterina

f12d9c23-5589-40b8-bcf9-a04fe9dedf61

Arnaboldi, Luca

b7ba4883-52bd-4950-b63d-2e993c951b5b

Daggitt, Matthew L.

7788a0b1-f07e-4b37-b34a-77b7d6ad4005

Isac, Omri

c7b07a87-a81f-422f-b367-b256879b5a46

Katz, Guy

0d2bbdb4-3a24-482d-822d-bf8336f92500

Rieser, Verena

46a9e502-2839-46c8-bfd9-096ad5f4f3a8

Lemon, Oliver

806a87ea-f2bd-44c2-acfa-52bf0ea07784

Casadio, Marco, Dinkar, Tanvi, Komendantskaya, Ekaterina, Arnaboldi, Luca, Daggitt, Matthew L., Isac, Omri, Katz, Guy, Rieser, Verena and Lemon, Oliver (2025) NLP verification: towards a general methodology for certifying robustness. European Journal of Applied Mathematics. (doi:10.1017/S0956792525000099).

Record type: Article

Abstract

Text

nlp-verification-towards-a-general-methodology-for-certifying-robustness - Version of Record

Available under License Creative Commons Attribution.

Download (1MB)

More information

Accepted/In Press date: 18 February 2025

e-pub ahead of print date: 2 April 2025

Published date: 2 April 2025

Additional Information: Publisher Copyright: © The Author(s), 2025. Published by Cambridge University Press.

Keywords: adversarial training, machine learning, natural language processing, Neural networks, robustness, verification

Identifiers

Local EPrints ID: 500801

URI: http://eprints.soton.ac.uk/id/eprint/500801

DOI: doi:10.1017/S0956792525000099

ISSN: 0956-7925

PURE UUID: 79cbc5d2-660f-40e0-bb20-be9eeebefb40

ORCID for Ekaterina Komendantskaya:

orcid.org/0000-0002-3240-0987

Catalogue record

Date deposited: 13 May 2025 16:55

Last modified: 30 Aug 2025 02:14

Export record

Altmetrics

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Marco Casadio

Author: Tanvi Dinkar

Author: Ekaterina Komendantskaya

Author: Luca Arnaboldi

Author: Matthew L. Daggitt

Author: Omri Isac

Author: Guy Katz

Author: Verena Rieser

Author: Oliver Lemon

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information