The University of Southampton
University of Southampton Institutional Repository

A survey of safety and trustworthiness of large language models through the lens of verification and validation

A survey of safety and trustworthiness of large language models through the lens of verification and validation
A survey of safety and trustworthiness of large language models through the lens of verification and validation
Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements.
cs.AI, cs.LG
Huang, Xiaowei
ea80b217-6df4-4708-970d-93303f2a17e5
Ruan, Wenjie
1676cb99-67f1-4c70-90f5-9ab2b54f3ed6
Huang, Wei
bd1464ed-9914-4bab-8eb0-37e1bd50f9bf
Jin, Gaojie
557c0b87-4303-40f3-9639-81b458fbdc86
Dong, Yi
355a62d9-5d1a-4c14-a900-9911e8c62453
Wu, Changshun
c8076c30-3beb-4f0d-bd68-390277f6be1c
Bensalem, Saddek
14e1c08b-ec0a-4d2b-9562-7eebaa4c8c8a
Mu, Ronghui
5cdd24b7-8126-4064-a857-3c6868453554
Qi, Yi
054b21ea-bce4-4506-a328-e38a7f98cd65
Zhao, Xingyu
56d69104-77e5-4741-bca1-c0fa13f433fe
Cai, Kaiwen
b6a7c9c4-ee2e-4975-ae39-67fd29566db9
Zhang, Yanghao
79e82a20-c4fb-4d62-841c-860cae2fcc7f
Wu, Sihao
ea333a04-ef54-4948-98df-78ae0c472906
Xu, Peipei
0a67e9c0-d8ee-4611-9466-03c1b0bd65a8
Wu, Dengyu
428f58dc-6759-4dbd-bd94-f263e0324665
Freitas, Andre
c7a66eef-8f9d-4006-9d6c-cc75e6d6fe19
Mustafa, Mustafa A.
30db5304-1f3e-4260-b381-757f667c8773
Huang, Xiaowei
ea80b217-6df4-4708-970d-93303f2a17e5
Ruan, Wenjie
1676cb99-67f1-4c70-90f5-9ab2b54f3ed6
Huang, Wei
bd1464ed-9914-4bab-8eb0-37e1bd50f9bf
Jin, Gaojie
557c0b87-4303-40f3-9639-81b458fbdc86
Dong, Yi
355a62d9-5d1a-4c14-a900-9911e8c62453
Wu, Changshun
c8076c30-3beb-4f0d-bd68-390277f6be1c
Bensalem, Saddek
14e1c08b-ec0a-4d2b-9562-7eebaa4c8c8a
Mu, Ronghui
5cdd24b7-8126-4064-a857-3c6868453554
Qi, Yi
054b21ea-bce4-4506-a328-e38a7f98cd65
Zhao, Xingyu
56d69104-77e5-4741-bca1-c0fa13f433fe
Cai, Kaiwen
b6a7c9c4-ee2e-4975-ae39-67fd29566db9
Zhang, Yanghao
79e82a20-c4fb-4d62-841c-860cae2fcc7f
Wu, Sihao
ea333a04-ef54-4948-98df-78ae0c472906
Xu, Peipei
0a67e9c0-d8ee-4611-9466-03c1b0bd65a8
Wu, Dengyu
428f58dc-6759-4dbd-bd94-f263e0324665
Freitas, Andre
c7a66eef-8f9d-4006-9d6c-cc75e6d6fe19
Mustafa, Mustafa A.
30db5304-1f3e-4260-b381-757f667c8773

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements.

Text
2305.11391v2 - Author's Original
Available under License Creative Commons Attribution.
Download (1MB)

More information

Published date: 19 May 2023
Keywords: cs.AI, cs.LG

Identifiers

Local EPrints ID: 483955
URI: http://eprints.soton.ac.uk/id/eprint/483955
PURE UUID: 245a29cb-7563-49ce-96c2-2aec53bb64ac
ORCID for Yi Dong: ORCID iD orcid.org/0000-0003-3047-7777

Catalogue record

Date deposited: 07 Nov 2023 18:53
Last modified: 18 Mar 2024 04:17

Export record

Altmetrics

Contributors

Author: Xiaowei Huang
Author: Wenjie Ruan
Author: Wei Huang
Author: Gaojie Jin
Author: Yi Dong ORCID iD
Author: Changshun Wu
Author: Saddek Bensalem
Author: Ronghui Mu
Author: Yi Qi
Author: Xingyu Zhao
Author: Kaiwen Cai
Author: Yanghao Zhang
Author: Sihao Wu
Author: Peipei Xu
Author: Dengyu Wu
Author: Andre Freitas
Author: Mustafa A. Mustafa

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×