An investigation into neural arithmetic logic modules
An investigation into neural arithmetic logic modules
The human ability to learn and reuse skills in a systematic manner is critical to our daily routines. For example, having the skills for executing the basic arithmetic operations (+,−,×,÷) allows a person to perform a variety of tasks including budgeting expenses, scaling measurements to the desired proportions when cooking/baking, and planning travel schedules. Machine Learning (ML) can reduce the manual workload for humans, inferring underlying relations within the data without the need for heavy feature engineering. However, the ability of such models to extrapolate and generalise to unseen data in an interpretable manner is challenging. With this challenge in mind, Neural Arithmetic Logic Modules (NALMs) have been developed. Such parameterised modules, specialised for arithmetic operations, are designed to guarantee generalisation if weights are correctly learned and be interpretable in what they learn. This thesis seeks to thoroughly investigate the proposition that such specialised differentiable modules with inductive biases toward arithmetic can be learned, uncovering the limitations which remain. In this work, we begin by studying the extent to which NALMs are able to learn arithmetic. We initially provide a comprehensive review of existing NALMs and take our analysis a step further with empirical results on a new benchmark with evaluation metrics specifically for measuring extrapolation performance. From this, we identify two arithmetic operations to further investigate, namely multiplication and division. For multiplication, we show how stochasticity can be applied to alleviate issues regarding falling into local minimas which cannot extrapolate. For division, we show through an extensive set of empirical results the mechanisms which can aid and hinder robustness. Factors other than the architecture are investigated including using images as the input modality, using a different loss criterion and feature scaling. In the final chapter, we draw inspiration from a human cognitive theory, the Global Workspace Theory (GWT), to develop an end-to-end architecture to combine different NALMs for compositional arithmetic.
University of Southampton
Mistry, Bhumika
36ac2f06-1a50-4c50-ab5e-a57c3faab549
July 2023
Mistry, Bhumika
36ac2f06-1a50-4c50-ab5e-a57c3faab549
Farrahi, Kate
bc848b9c-fc32-475c-b241-f6ade8babacb
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Mistry, Bhumika
(2023)
An investigation into neural arithmetic logic modules.
University of Southampton, Doctoral Thesis, 214pp.
Record type:
Thesis
(Doctoral)
Abstract
The human ability to learn and reuse skills in a systematic manner is critical to our daily routines. For example, having the skills for executing the basic arithmetic operations (+,−,×,÷) allows a person to perform a variety of tasks including budgeting expenses, scaling measurements to the desired proportions when cooking/baking, and planning travel schedules. Machine Learning (ML) can reduce the manual workload for humans, inferring underlying relations within the data without the need for heavy feature engineering. However, the ability of such models to extrapolate and generalise to unseen data in an interpretable manner is challenging. With this challenge in mind, Neural Arithmetic Logic Modules (NALMs) have been developed. Such parameterised modules, specialised for arithmetic operations, are designed to guarantee generalisation if weights are correctly learned and be interpretable in what they learn. This thesis seeks to thoroughly investigate the proposition that such specialised differentiable modules with inductive biases toward arithmetic can be learned, uncovering the limitations which remain. In this work, we begin by studying the extent to which NALMs are able to learn arithmetic. We initially provide a comprehensive review of existing NALMs and take our analysis a step further with empirical results on a new benchmark with evaluation metrics specifically for measuring extrapolation performance. From this, we identify two arithmetic operations to further investigate, namely multiplication and division. For multiplication, we show how stochasticity can be applied to alleviate issues regarding falling into local minimas which cannot extrapolate. For division, we show through an extensive set of empirical results the mechanisms which can aid and hinder robustness. Factors other than the architecture are investigated including using images as the input modality, using a different loss criterion and feature scaling. In the final chapter, we draw inspiration from a human cognitive theory, the Global Workspace Theory (GWT), to develop an end-to-end architecture to combine different NALMs for compositional arithmetic.
Text
Doctoral Thesis PDFA: An Investigation into Neural Arithmetic Logic Modules by Mistry
- Version of Record
Text
Final-thesis-submission-Examination-Miss-Bhumika-Mistry
Restricted to Repository staff only
More information
Published date: July 2023
Identifiers
Local EPrints ID: 478926
URI: http://eprints.soton.ac.uk/id/eprint/478926
PURE UUID: a0203875-10ea-42ba-8d8a-af42901361f4
Catalogue record
Date deposited: 14 Jul 2023 16:31
Last modified: 18 Mar 2024 03:03
Export record
Contributors
Author:
Bhumika Mistry
Thesis advisor:
Kate Farrahi
Thesis advisor:
Jonathon Hare
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics