Modelling the emergence of a basis for vocal communication between artificial agents

Worgan, Simon F. (2010) Modelling the emergence of a basis for vocal communication between artificial agents. University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 152pp.

Record type: Thesis (Doctoral)

Abstract

Understanding the human faculty for speech presents a fundamental and complex problem. We do not know how humans decode the rapid speech signal and the origins and evolution of speech remain shrouded in mystery. Speakers generate a continuous stream of sounds apparently devoid of any specifying invariant features. Despite this absence, we can effortlessly decode this stream and comprehend the utterances of others. Moreover, the form of these utterances is shared and mutually understood by a large population of speakers. In this thesis, we present a multi-agent model that simulates the emergence of a system with shared auditory features and articulatory tokens. Based upon notions of intentionality and the absence of specifying invariants, each agent produces and perceives speech, learning to control an articulatory model of the vocal tract and perceiving the resulting signal through a biologically plausible artificial auditory system. By firmly establishing each aspect of our model in current phonetic theory, we are able to make useful claims and justify our inevitable abstractions. For example, Lindblom’s theory of hyper- and hypo-articulation, where speakers seek maximum auditory distinction for minimal articulatory effort, justifies our choice of an articulatory vocal tract coupled with a direct measure of effort. By removing the abstractions of previous phonetic models we have been able to reconsider the current assumption that specifying invariants, in either the auditory or articulatory domain, must indicate the presence of auditory or articulatory symbolic tokens in the cognitive domain. Rather we consider speech perception to proceed through Gibsonian direct realism where the signal is manipulated by the speaker to enable the perception of the affordances within speech. We conclude that the speech signal is constrained by the intention of the speaker and the structure of the vocal tract and decoded through an interaction of the peripheral auditory system and complex pattern recognition of multiple acoustic cues. Far from passive ‘variance mopping’, this recognition proceeds through the constant refinement of an unbroken loop between production and perception.

Text

phd.pdf - Other

Download (2MB)

More information

Published date: January 2010

Organisations: University of Southampton

Identifiers

Local EPrints ID: 79524

URI: http://eprints.soton.ac.uk/id/eprint/79524

PURE UUID: a8938c1d-7830-4bbe-b303-724f48dc6d31

Catalogue record

Date deposited: 16 Mar 2010

Last modified: 14 Mar 2024 00:31

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Simon F. Worgan

Thesis advisor: Robert Damper

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information