The University of Southampton
University of Southampton Institutional Repository

Artificial organisations

Artificial organisations
Artificial organisations
Alignment research focuses on making individual AI systems reliable. Human institutions achieve reliable collective behaviour differently—they mitigate the risk posed by misaligned individuals through organisational structure. Multi-agent AI systems should follow this institutional model: using compartmentalisation and adversarial review to achieve reliable outcomes through architectural design rather than assuming individual alignment.

We demonstrate this approach through the Perseverance Composition Engine, a multi-agent system for document composition. The Composer drafts text, the Corroborator verifies factual substantiation with full source access, and the Critic evaluates argumentative quality without access to sources—information asymmetry enforced by system architecture. This creates layered verification: the Corroborator detects unsupported claims, whilst the Critic independently assesses coherence and completeness. Observations from 474 composition tasks—discrete cycles of drafting, verification, and evaluation—exhibit patterns consistent with the institutional hypothesis. The verification agent detected fabrication in 52% of submitted drafts. Iterative feedback between compartmentalised roles produced 79% quality improvement over 4.3 iterations on average. When assigned impossible tasks requiring fabricated content, this iteration enabled progression from attempted fabrication toward honest refusal with alternative proposals—behaviour neither instructed nor individually incentivised. These findings motivate controlled investigation of whether architectural enforcement produces reliable outcomes from unreliable components.

This positions organisational theory as a productive framework for multi-agent AI safety. By implementing verification and evaluation as structural properties enforced through information compartmentalisation, institutional design offers a route to reliable collective behaviour from unreliable individual components.
artificial organisations, multi-agent systems, organisational theory, institutional design, transactive memory, information compartmentalisation, epistemic integrity, model organisms, distributed cognition, institutional memory
Waites, William
a069e5ff-f440-4b89-ae81-3b58c2ae2afd
Waites, William
a069e5ff-f440-4b89-ae81-3b58c2ae2afd

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

Alignment research focuses on making individual AI systems reliable. Human institutions achieve reliable collective behaviour differently—they mitigate the risk posed by misaligned individuals through organisational structure. Multi-agent AI systems should follow this institutional model: using compartmentalisation and adversarial review to achieve reliable outcomes through architectural design rather than assuming individual alignment.

We demonstrate this approach through the Perseverance Composition Engine, a multi-agent system for document composition. The Composer drafts text, the Corroborator verifies factual substantiation with full source access, and the Critic evaluates argumentative quality without access to sources—information asymmetry enforced by system architecture. This creates layered verification: the Corroborator detects unsupported claims, whilst the Critic independently assesses coherence and completeness. Observations from 474 composition tasks—discrete cycles of drafting, verification, and evaluation—exhibit patterns consistent with the institutional hypothesis. The verification agent detected fabrication in 52% of submitted drafts. Iterative feedback between compartmentalised roles produced 79% quality improvement over 4.3 iterations on average. When assigned impossible tasks requiring fabricated content, this iteration enabled progression from attempted fabrication toward honest refusal with alternative proposals—behaviour neither instructed nor individually incentivised. These findings motivate controlled investigation of whether architectural enforcement produces reliable outcomes from unreliable components.

This positions organisational theory as a productive framework for multi-agent AI safety. By implementing verification and evaluation as structural properties enforced through information compartmentalisation, institutional design offers a route to reliable collective behaviour from unreliable individual components.

Text
artificial-organisations - Author's Original
Available under License Creative Commons Attribution.
Download (233kB)
Text
persevere - Other
Available under License Creative Commons Attribution.
Download (186kB)

More information

e-pub ahead of print date: 5 February 2026
Additional Information: This draft version of the paper is preserved because it proved significant. When provided as a reference document for elaborating the background section, a task that failed due to inadequate source material, PCE spontaneously requested the addition of a case study on "Honest Refusal Under Impossible Task Constraints''---just as described in this paper. The final document will need to cite this draft verbatim.
Keywords: artificial organisations, multi-agent systems, organisational theory, institutional design, transactive memory, information compartmentalisation, epistemic integrity, model organisms, distributed cognition, institutional memory

Identifiers

Local EPrints ID: 508768
URI: http://eprints.soton.ac.uk/id/eprint/508768
PURE UUID: 9d396e53-6fa4-48f7-867a-b799c58b4d18
ORCID for William Waites: ORCID iD orcid.org/0000-0002-7759-6805

Catalogue record

Date deposited: 03 Feb 2026 17:38
Last modified: 10 Feb 2026 03:23

Export record

Contributors

Author: William Waites ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×