A survey of safety and trustworthiness of large language models through the lens of verification and validation

Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements.

cs.AI, cs.LG

10.48550/arXiv.2305.11391

Huang, Xiaowei

ea80b217-6df4-4708-970d-93303f2a17e5

Ruan, Wenjie

1676cb99-67f1-4c70-90f5-9ab2b54f3ed6

Huang, Wei

bd1464ed-9914-4bab-8eb0-37e1bd50f9bf

Jin, Gaojie

557c0b87-4303-40f3-9639-81b458fbdc86

Dong, Yi

355a62d9-5d1a-4c14-a900-9911e8c62453

Wu, Changshun

c8076c30-3beb-4f0d-bd68-390277f6be1c

Bensalem, Saddek

14e1c08b-ec0a-4d2b-9562-7eebaa4c8c8a

Mu, Ronghui

5cdd24b7-8126-4064-a857-3c6868453554

Qi, Yi

054b21ea-bce4-4506-a328-e38a7f98cd65

Zhao, Xingyu

56d69104-77e5-4741-bca1-c0fa13f433fe

Cai, Kaiwen

b6a7c9c4-ee2e-4975-ae39-67fd29566db9

Zhang, Yanghao

79e82a20-c4fb-4d62-841c-860cae2fcc7f

Wu, Sihao

ea333a04-ef54-4948-98df-78ae0c472906

Xu, Peipei

0a67e9c0-d8ee-4611-9466-03c1b0bd65a8

Wu, Dengyu

428f58dc-6759-4dbd-bd94-f263e0324665

Freitas, Andre

c7a66eef-8f9d-4006-9d6c-cc75e6d6fe19

Mustafa, Mustafa A.

30db5304-1f3e-4260-b381-757f667c8773

19 May 2023

Huang, Xiaowei

ea80b217-6df4-4708-970d-93303f2a17e5

Ruan, Wenjie

1676cb99-67f1-4c70-90f5-9ab2b54f3ed6

Huang, Wei

bd1464ed-9914-4bab-8eb0-37e1bd50f9bf

Jin, Gaojie

557c0b87-4303-40f3-9639-81b458fbdc86

Dong, Yi

355a62d9-5d1a-4c14-a900-9911e8c62453

Wu, Changshun

c8076c30-3beb-4f0d-bd68-390277f6be1c

Bensalem, Saddek

14e1c08b-ec0a-4d2b-9562-7eebaa4c8c8a

Mu, Ronghui

5cdd24b7-8126-4064-a857-3c6868453554

Qi, Yi

054b21ea-bce4-4506-a328-e38a7f98cd65

Zhao, Xingyu

56d69104-77e5-4741-bca1-c0fa13f433fe

Cai, Kaiwen

b6a7c9c4-ee2e-4975-ae39-67fd29566db9

Zhang, Yanghao

79e82a20-c4fb-4d62-841c-860cae2fcc7f

Wu, Sihao

ea333a04-ef54-4948-98df-78ae0c472906

Xu, Peipei

0a67e9c0-d8ee-4611-9466-03c1b0bd65a8

Wu, Dengyu

428f58dc-6759-4dbd-bd94-f263e0324665

Freitas, Andre

c7a66eef-8f9d-4006-9d6c-cc75e6d6fe19

Mustafa, Mustafa A.

30db5304-1f3e-4260-b381-757f667c8773

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

Text

2305.11391v2 - Author's Original

Available under License Creative Commons Attribution.

Download (1MB)

More information

Published date: 19 May 2023

Keywords: cs.AI, cs.LG

Identifiers

Local EPrints ID: 483955

URI: http://eprints.soton.ac.uk/id/eprint/483955

DOI: doi:10.48550/arXiv.2305.11391

PURE UUID: 245a29cb-7563-49ce-96c2-2aec53bb64ac

ORCID for Yi Dong:

orcid.org/0000-0003-3047-7777

Catalogue record

Date deposited: 07 Nov 2023 18:53

Last modified: 18 Mar 2024 04:17

Export record

Altmetrics

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Xiaowei Huang

Author: Wenjie Ruan

Author: Wei Huang

Author: Gaojie Jin

Author: Yi Dong

Author: Changshun Wu

Author: Saddek Bensalem

Author: Ronghui Mu

Author: Yi Qi

Author: Xingyu Zhao

Author: Kaiwen Cai

Author: Yanghao Zhang

Author: Sihao Wu

Author: Peipei Xu

Author: Dengyu Wu

Author: Andre Freitas

Author: Mustafa A. Mustafa

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information