Brown, M., An, P.E., Harris, C.J. and Wang, H.
How Biased is Your Multi-Layered Perceptron?
Proc. World Congress on Neural Networks
Full text not available from this repository.
Gradient descent and instantaneous gradient descent learning rules are popular methods for training neural models. Backwards Error Propagation (BEP) applied to Multi-Layer Prectron (MLP) is one example of nonlinear gradient descent, and Widrow's Adaptive Linear Combiner (ALC) and the Albus CMAC are both generally trained using (instantaneous) gradient descent rules. However, these learning algorithms are often applied without regard for the condition of the resultant optimisation problem. Often the basic model can be transformed such that its modelling capabilities remain unchanged, but the condition of the optimisation problem is improved. In this paper, the basic theory behind gradient descent adaptive algorithms will be stated, and then it will be applied to a wide range of common neural networks. In the simplest case it will be shown how the condition of the ALC is dependent on the domain over which the training data is distributed, and the important concept of orthogonal functions will be introduced. This network is then compared with alternative models (eg. B-splines) which have identical modelling capabilities but a different internal representation. The theory is then applied to MLPs trained using gradient descent algorithms and it is shown that MLPs with sigmoids whose outputs lie in the range [0,1] are inherently ill-conditioned, although this can be overcome by simple translation and scaling procedures. All of these results can be generalised to instantaneous gradient descent procedures.
Actions (login required)