The University of Southampton
University of Southampton Institutional Repository

Modelling the emergence of a basis for vocal communication between artificial agents

Worgan, Simon F. (2010) Modelling the emergence of a basis for vocal communication between artificial agents University of Southampton, School of Electronics and Computer Science, Doctoral Thesis , 152pp.

Record type: Thesis (Doctoral)


Understanding the human faculty for speech presents a fundamental and complex problem. We do not know how humans decode the rapid speech signal and the origins and evolution of speech remain shrouded in mystery. Speakers generate a continuous stream of sounds apparently devoid of any specifying invariant features. Despite this absence, we can effortlessly decode this stream and comprehend the utterances of others. Moreover, the form of these utterances is shared and mutually understood by a large population of speakers. In this thesis, we present a multi-agent model that simulates the emergence of a system with shared auditory features and articulatory tokens. Based upon notions of intentionality and the absence of specifying invariants, each agent produces and perceives speech, learning to control an articulatory model of the vocal tract and perceiving the resulting signal through a biologically plausible artificial auditory system. By firmly establishing each aspect of our model in current phonetic theory, we are able to make useful claims and justify our inevitable abstractions. For example, Lindblom’s theory of hyper- and hypo-articulation, where speakers seek maximum auditory distinction for minimal articulatory effort, justifies our choice of an articulatory vocal tract coupled with a direct measure of effort. By removing the abstractions of previous phonetic models we have been able to reconsider the current assumption that specifying invariants, in either the auditory or articulatory domain, must indicate the presence of auditory or articulatory symbolic tokens in the cognitive domain. Rather we consider speech perception to proceed through Gibsonian direct realism where the signal is manipulated by the speaker to enable the perception of the affordances within speech. We conclude that the speech signal is constrained by the intention of the speaker and the structure of the vocal tract and decoded through an interaction of the peripheral auditory system and complex pattern recognition of multiple acoustic cues. Far from passive ‘variance mopping’, this recognition proceeds through the constant refinement of an unbroken loop between production and perception.

PDF phd.pdf - Other
Download (2MB)

More information

Published date: January 2010
Organisations: University of Southampton


Local EPrints ID: 79524
PURE UUID: a8938c1d-7830-4bbe-b303-724f48dc6d31

Catalogue record

Date deposited: 16 Mar 2010
Last modified: 18 Jul 2017 23:17

Export record


Author: Simon F. Worgan
Thesis advisor: Robert Damper

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton:

ePrints Soton supports OAI 2.0 with a base URL of

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.