An evolutionary black-box framework for adversarial prompt generation in large language models
An evolutionary black-box framework for adversarial prompt generation in large language models
Large language models (LLMs) remain susceptible to adversarial prompts that can bypass alignment mechanisms. Existing approaches to adversarial prompt generation typically rely on manual prompt engineering, helper LLMs, or white-box adversarial machine learning methods, which either lack scalability or require access to model internals. In this paper, we propose a novel black-box framework for automated adversarial prompt generation based on evolutionary algorithms. The framework is instantiated using a genetic algorithm and an evolution strategy and operates without access to internal model parameters, making it applicable to both open-source and proprietary LLMs. To improve search effectiveness under realistic query constraints, we introduce a novel population initialisation strategy based on templates, pre-prompts, and post-prompts. Evolutionary search is guided by heuristic, model-agnostic fitness signals derived from prompt goal semantic similarity, refusal based response assessment, and a small heuristic lexical bonus based on lightweight instruction-following indicators. We evaluate our framework across multiple LLMs using a refusal based attack success rate metric, demonstrating consistent improvements over direct dataset prompting and competitive performance against a state-of-the-art white-box baseline under comparable query budgets. Additional analyses examine fitness stabilisation and cross-model transferability for unseen models.
Sun, Qiyang
53f85493-7cc2-4041-a50a-4d26d2d36a5d
Karafili, Erisa
f5efa31c-22b8-443e-8107-e488bd28918e
Sun, Qiyang
53f85493-7cc2-4041-a50a-4d26d2d36a5d
Karafili, Erisa
f5efa31c-22b8-443e-8107-e488bd28918e
Sun, Qiyang and Karafili, Erisa
(2026)
An evolutionary black-box framework for adversarial prompt generation in large language models.
CODASPY'26: Sixteenth ACM Conference on Data and Application Security and Privacy, , Frankfurt am Main, Germany.
23 - 25 Jun 2026.
11 pp
.
(In Press)
Record type:
Conference or Workshop Item
(Paper)
Abstract
Large language models (LLMs) remain susceptible to adversarial prompts that can bypass alignment mechanisms. Existing approaches to adversarial prompt generation typically rely on manual prompt engineering, helper LLMs, or white-box adversarial machine learning methods, which either lack scalability or require access to model internals. In this paper, we propose a novel black-box framework for automated adversarial prompt generation based on evolutionary algorithms. The framework is instantiated using a genetic algorithm and an evolution strategy and operates without access to internal model parameters, making it applicable to both open-source and proprietary LLMs. To improve search effectiveness under realistic query constraints, we introduce a novel population initialisation strategy based on templates, pre-prompts, and post-prompts. Evolutionary search is guided by heuristic, model-agnostic fitness signals derived from prompt goal semantic similarity, refusal based response assessment, and a small heuristic lexical bonus based on lightweight instruction-following indicators. We evaluate our framework across multiple LLMs using a refusal based attack success rate metric, demonstrating consistent improvements over direct dataset prompting and competitive performance against a state-of-the-art white-box baseline under comparable query budgets. Additional analyses examine fitness stabilisation and cross-model transferability for unseen models.
Text
CODASPY_2026_Camera_Ready_Pure
- Accepted Manuscript
More information
Accepted/In Press date: 27 February 2026
Venue - Dates:
CODASPY'26: Sixteenth ACM Conference on Data and Application Security and Privacy, , Frankfurt am Main, Germany, 2026-06-23 - 2026-06-25
Identifiers
Local EPrints ID: 511265
URI: http://eprints.soton.ac.uk/id/eprint/511265
PURE UUID: dbc28b71-4dfe-4a78-a3b0-e97ea1204b58
Catalogue record
Date deposited: 11 May 2026 16:34
Last modified: 12 May 2026 02:18
Export record
Contributors
Author:
Qiyang Sun
Author:
Erisa Karafili
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics