The University of Southampton
University of Southampton Institutional Repository

Information bottleneck theory based exploration of cascade learning

Information bottleneck theory based exploration of cascade learning
Information bottleneck theory based exploration of cascade learning

In solving challenging pattern recognition problems, deep neural networks have shown excellent performance by forming powerful mappings between inputs and targets, learning representations (features) and making subsequent predictions. A recent tool to help understand how representations are formed is based on observing the dynamics of learning on an information plane using mutual information, linking the input to the representation (I(X; T)) and the representation to the target (I(T; Y)). In this paper, we use an information theoretical approach to understand how Cascade Learning (CL), a method to train deep neural networks layer-by-layer, learns representations, as CL has shown comparable results while saving computation and memory costs. We observe that performance is not linked to information–compression, which differs from observation on End-to-End (E2E) learning. Additionally, CL can inherit information about targets, and gradually specialise extracted features layer-by-layer. We evaluate this effect by proposing an information transition ratio, I(T; Y)/I(X; T), and show that it can serve as a useful heuristic in setting the depth of a neural network that achieves satisfactory accuracy of classification.

Cascade Learning, Information bottleneck theory, Neural networks
Du, Xin
9629013b-b962-4a81-bf18-7797d581fdd8
Farrahi, Katayoun
bc848b9c-fc32-475c-b241-f6ade8babacb
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Du, Xin
9629013b-b962-4a81-bf18-7797d581fdd8
Farrahi, Katayoun
bc848b9c-fc32-475c-b241-f6ade8babacb
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Du, Xin, Farrahi, Katayoun and Niranjan, Mahesan (2021) Information bottleneck theory based exploration of cascade learning. Entropy, 23 (10), [1360]. (doi:10.3390/e23101360).

Record type: Article

Abstract

In solving challenging pattern recognition problems, deep neural networks have shown excellent performance by forming powerful mappings between inputs and targets, learning representations (features) and making subsequent predictions. A recent tool to help understand how representations are formed is based on observing the dynamics of learning on an information plane using mutual information, linking the input to the representation (I(X; T)) and the representation to the target (I(T; Y)). In this paper, we use an information theoretical approach to understand how Cascade Learning (CL), a method to train deep neural networks layer-by-layer, learns representations, as CL has shown comparable results while saving computation and memory costs. We observe that performance is not linked to information–compression, which differs from observation on End-to-End (E2E) learning. Additionally, CL can inherit information about targets, and gradually specialise extracted features layer-by-layer. We evaluate this effect by proposing an information transition ratio, I(T; Y)/I(X; T), and show that it can serve as a useful heuristic in setting the depth of a neural network that achieves satisfactory accuracy of classification.

Text
entropy-23-01360-v3 - Version of Record
Available under License Creative Commons Attribution.
Download (2MB)

More information

Accepted/In Press date: 18 October 2021
Published date: 18 October 2021
Additional Information: Funding Information: Author Contributions: Conceptualization, all authors; methodology, X.D.; software, X.D.; validation, X.D.; formal analysis, X.D.; visualization, X.D.; writing—original draft preparation, all authors; supervision, K.F. and M.N. All authors have read and agreed to the published version of the manuscript Funding: M.N.’s contribution is partially funded by EPSRC grant “Artificial and Augmented Intelligence for Automated Scientific Discovery” (EP/S000356/1). Publisher Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.
Keywords: Cascade Learning, Information bottleneck theory, Neural networks

Identifiers

Local EPrints ID: 452209
URI: http://eprints.soton.ac.uk/id/eprint/452209
PURE UUID: ef574a78-24e9-4a4f-9bf2-284e1fc1548c
ORCID for Katayoun Farrahi: ORCID iD orcid.org/0000-0001-6775-127X
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 30 Nov 2021 17:31
Last modified: 07 Sep 2022 01:54

Export record

Altmetrics

Contributors

Author: Xin Du
Author: Katayoun Farrahi ORCID iD
Author: Mahesan Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×