An evolutionary black-box framework for adversarial prompt generation in large language models

Large language models (LLMs) remain susceptible to adversarial prompts that can bypass alignment mechanisms. Existing approaches to adversarial prompt generation typically rely on manual prompt engineering, helper LLMs, or white-box adversarial machine learning methods, which either lack scalability or require access to model internals. In this paper, we propose a novel black-box framework for automated adversarial prompt generation based on evolutionary algorithms. The framework is instantiated using a genetic algorithm and an evolution strategy and operates without access to internal model parameters, making it applicable to both open-source and proprietary LLMs. To improve search effectiveness under realistic query constraints, we introduce a novel population initialisation strategy based on templates, pre-prompts, and post-prompts. Evolutionary search is guided by heuristic, model-agnostic fitness signals derived from prompt goal semantic similarity, refusal based response assessment, and a small heuristic lexical bonus based on lightweight instruction-following indicators. We evaluate our framework across multiple LLMs using a refusal based attack success rate metric, demonstrating consistent improvements over direct dataset prompting and competitive performance against a state-of-the-art white-box baseline under comparable query budgets. Additional analyses examine fitness stabilisation and cross-model transferability for unseen models.

Sun, Qiyang

53f85493-7cc2-4041-a50a-4d26d2d36a5d

Karafili, Erisa

f5efa31c-22b8-443e-8107-e488bd28918e

Sun, Qiyang

53f85493-7cc2-4041-a50a-4d26d2d36a5d

Karafili, Erisa

f5efa31c-22b8-443e-8107-e488bd28918e

Sun, Qiyang and Karafili, Erisa (2026) An evolutionary black-box framework for adversarial prompt generation in large language models. CODASPY'26: Sixteenth ACM Conference on Data and Application Security and Privacy, , Frankfurt am Main, Germany. 23 - 25 Jun 2026. 11 pp . (In Press)

Record type: Conference or Workshop Item (Paper)

Abstract

Text

CODASPY_2026_Camera_Ready_Pure - Accepted Manuscript

Available under License Creative Commons Attribution No Derivatives.

Download (203kB)

Text

CODASPY_2026

Available under License Creative Commons Attribution No Derivatives.

Download (203kB)

More information

Accepted/In Press date: 27 February 2026

Venue - Dates: CODASPY'26: Sixteenth ACM Conference on Data and Application Security and Privacy, , Frankfurt am Main, Germany, 2026-06-23 - 2026-06-25

Learn more about the School of Electronics and Computer Science Learn more about the Cyber Security