Modelling the emergence of a basis for vocal communication between artificial agents
Modelling the emergence of a basis for vocal communication between artificial agents
Understanding the human faculty for speech presents a fundamental and complex problem. We do not know how humans decode the rapid speech signal and the origins and evolution of speech remain shrouded in mystery. Speakers generate a continuous stream of sounds apparently devoid of any specifying invariant features. Despite this absence, we can effortlessly decode this stream and comprehend the utterances of others. Moreover, the form of these utterances is shared and mutually understood by a large population of speakers. In this thesis, we present a multi-agent model that simulates the emergence of a system with shared auditory features and articulatory tokens. Based upon notions of intentionality and the absence of specifying invariants, each agent produces and perceives speech, learning to control an articulatory model of the vocal tract and perceiving the resulting signal through a biologically plausible artificial auditory system. By firmly establishing each aspect of our model in current phonetic theory, we are able to make useful claims and justify our inevitable abstractions. For example, Lindblom’s theory of hyper- and hypo-articulation, where speakers seek maximum auditory distinction for minimal articulatory effort, justifies our choice of an articulatory vocal tract coupled with a direct measure of effort. By removing the abstractions of previous phonetic models we have been able to reconsider the current assumption that specifying invariants, in either the auditory or articulatory domain, must indicate the presence of auditory or articulatory symbolic tokens in the cognitive domain. Rather we consider speech perception to proceed through Gibsonian direct realism where the signal is manipulated by the speaker to enable the perception of the affordances within speech. We conclude that the speech signal is constrained by the intention of the speaker and the structure of the vocal tract and decoded through an interaction of the peripheral auditory system and complex pattern recognition of multiple acoustic cues. Far from passive ‘variance mopping’, this recognition proceeds through the constant refinement of an unbroken loop between production and perception.
Worgan, Simon F.
9f88d03f-1d55-496c-94cf-17013ec75029
January 2010
Worgan, Simon F.
9f88d03f-1d55-496c-94cf-17013ec75029
Damper, Robert
6e0e7fdc-57ec-44d4-bc0f-029d17ba441d
Worgan, Simon F.
(2010)
Modelling the emergence of a basis for vocal communication between artificial agents.
University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 152pp.
Record type:
Thesis
(Doctoral)
Abstract
Understanding the human faculty for speech presents a fundamental and complex problem. We do not know how humans decode the rapid speech signal and the origins and evolution of speech remain shrouded in mystery. Speakers generate a continuous stream of sounds apparently devoid of any specifying invariant features. Despite this absence, we can effortlessly decode this stream and comprehend the utterances of others. Moreover, the form of these utterances is shared and mutually understood by a large population of speakers. In this thesis, we present a multi-agent model that simulates the emergence of a system with shared auditory features and articulatory tokens. Based upon notions of intentionality and the absence of specifying invariants, each agent produces and perceives speech, learning to control an articulatory model of the vocal tract and perceiving the resulting signal through a biologically plausible artificial auditory system. By firmly establishing each aspect of our model in current phonetic theory, we are able to make useful claims and justify our inevitable abstractions. For example, Lindblom’s theory of hyper- and hypo-articulation, where speakers seek maximum auditory distinction for minimal articulatory effort, justifies our choice of an articulatory vocal tract coupled with a direct measure of effort. By removing the abstractions of previous phonetic models we have been able to reconsider the current assumption that specifying invariants, in either the auditory or articulatory domain, must indicate the presence of auditory or articulatory symbolic tokens in the cognitive domain. Rather we consider speech perception to proceed through Gibsonian direct realism where the signal is manipulated by the speaker to enable the perception of the affordances within speech. We conclude that the speech signal is constrained by the intention of the speaker and the structure of the vocal tract and decoded through an interaction of the peripheral auditory system and complex pattern recognition of multiple acoustic cues. Far from passive ‘variance mopping’, this recognition proceeds through the constant refinement of an unbroken loop between production and perception.
More information
Published date: January 2010
Organisations:
University of Southampton
Identifiers
Local EPrints ID: 79524
URI: http://eprints.soton.ac.uk/id/eprint/79524
PURE UUID: a8938c1d-7830-4bbe-b303-724f48dc6d31
Catalogue record
Date deposited: 16 Mar 2010
Last modified: 14 Mar 2024 00:31
Export record
Contributors
Author:
Simon F. Worgan
Thesis advisor:
Robert Damper
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics