Emergent visual communication

Our ability to perceive, represent and understand the surrounding visual world is one of the most fascinating, but equally intricate, parts of our nervous system. Supported by a sufficiently complex brain, we start to learn and develop these cognitive abilities and skills such as communication from the moment we are born. These capabilities are essential in numerous tasks we carry out in our daily life. The development of such intelligence is promoted by exploration of the environment and social interaction. As such, advancing artificial intelligent agents capable of interaction through communication with each other and with humans has been a long-standing goal. This thesis seeks to uncover how inter-agent communication about the visual world, emerging in a completely self-supervised way, can be modelled and its interpretability improved. This research draws inspiration from how human communication developed and first compares the processes involved in transmitting meaningful information between humans and machines. In the context of referential signalling games played with realistic images, intelligent agents modelled as deep neural networks have previously been shown to develop successful token-based communication protocols to achieve a shared goal. This thesis analyses the factors which influence the emergence of meaningful protocols and shows that visual semantics can be learned in a self-supervised way. Nonetheless, qualitative and quantitative insights into emergent token-based communication are not easily explainable to humans. We thus propose drawing as a communication channel which is a much simpler and more directly interpretable modality than language. To enable end-to-end learnable models of visual communication, a differentiable relaxation of the process of drawing vector primitives into pixel rasters is proposed. Using this approach, the physical act of drawing with a pen on paper can be modelled. We then demonstrate that agents cooperating on a signalling game learn to communicate through sketching. An extensive analysis of the factors which influence the meaning and intent of agents’ drawings is presented. The final two chapters show how interpretable sketches emerge when inducing visual perceptual similarity constraints. Through human evaluation of the emergent visual communication, we explore how, with appropriate inductive biases, artificial agents learn to draw in a fashion that humans can interpret.

University of Southampton

Mihai, Andreea Daniela

f8910fe1-18e7-45b3-8923-b34b5cd136fa

September 2022

Mihai, Andreea Daniela

f8910fe1-18e7-45b3-8923-b34b5cd136fa

Hare, Jonathon

65ba2cda-eaaf-4767-a325-cd845504e5a9

Mihai, Andreea Daniela (2022) Emergent visual communication. University of Southampton, Doctoral Thesis, 186pp.

Record type: Thesis (Doctoral)

Abstract

Text

Emergent Visual Communication-archival - Version of Record

Available under License University of Southampton Thesis Licence.

Download (79MB)

Text

Final-thesis-submission-Examination-Miss-Andreea-Mihai (2)

Restricted to Repository staff only

Available under License University of Southampton Thesis Licence.

More information

Published date: September 2022

Related URLs:

Learn more about School of Electronics and Computer Science research

Identifiers

Local EPrints ID: 469905

URI: http://eprints.soton.ac.uk/id/eprint/469905

PURE UUID: ff98f1f5-f206-480c-97f8-3c0634361170

ORCID for Andreea Daniela Mihai:

orcid.org/0000-0003-3368-9062

ORCID for Jonathon Hare:

orcid.org/0000-0003-2921-4283

Catalogue record

Date deposited: 28 Sep 2022 16:49

Last modified: 17 Mar 2024 03:05

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Andreea Daniela Mihai

Thesis advisor: Jonathon Hare

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information