The University of Southampton
University of Southampton Institutional Repository

Spherical motion dynamics: learning dynamics of normalized neural network using SGD and weight decay

Spherical motion dynamics: learning dynamics of normalized neural network using SGD and weight decay
Spherical motion dynamics: learning dynamics of normalized neural network using SGD and weight decay
In this paper, we comprehensively reveal the learning dynamics of normalized neural network using Stochastic Gradient Descent (with momentum) and Weight Decay (WD), named as Spherical Motion Dynamics (SMD). Most related works on this topic focus on studying “effective learning rate" using “equilibrium" assumption, i.e. assuming weight norm has converge to a fixed value. However, their discussion on why equilibrium can be reached is either absent or unjustified. To clarify the mechanism behind, our work directly explores the cause of equilibrium, which should be regarded as a special state of SMD. Specifically, 1) we introduce the assumptions that can lead to equilibrium state in SMD, and prove equilibrium can be reached in a linear rate regime; 2) we propose “angular update" as a substitute for effective learning rate to depict the state of SMD, and derive the theoretical value of angular update in equilibrium state; 3) we verify our assumptions and theoretical results on various large-scale computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations. Furthermore, we provide intuitive interpretations, showing how the behavior of angular update in SMD affects the optimization of neural network, and yields unexpected phenomenon in practice. We believe our findings and theoretical results can deepen our understanding on current training techniques for deep neural network.
6380-6391
Neural Information Processing Systems Foundation
Sun, Jian
bd3cd7a6-ce5d-4a71-bfaa-c1d295321354
Wan, Ruosi
d6cdf439-669f-4e34-b736-90fb87044a80
Zhang, Xiangyu
f450adff-04de-4e27-87e0-e3c570f5234f
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Ranzato, M.
Beygelzimer, A.
Dauphin, Y.
Liang, P.S.
Wortman Vaughan, J.
Sun, Jian
bd3cd7a6-ce5d-4a71-bfaa-c1d295321354
Wan, Ruosi
d6cdf439-669f-4e34-b736-90fb87044a80
Zhang, Xiangyu
f450adff-04de-4e27-87e0-e3c570f5234f
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Ranzato, M.
Beygelzimer, A.
Dauphin, Y.
Liang, P.S.
Wortman Vaughan, J.

Sun, Jian, Wan, Ruosi, Zhang, Xiangyu and Zhu, Zhanxing (2021) Spherical motion dynamics: learning dynamics of normalized neural network using SGD and weight decay. In, Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P.S. and Wortman Vaughan, J. (eds.) Advances in Neural Information Processing Systems 34. (Advances in Neural Information Processing Systems, 34) 35th Conference on Neural Information Processing Systems (06/12/21 - 14/12/21) Neural Information Processing Systems Foundation, pp. 6380-6391.

Record type: Book Section

Abstract

In this paper, we comprehensively reveal the learning dynamics of normalized neural network using Stochastic Gradient Descent (with momentum) and Weight Decay (WD), named as Spherical Motion Dynamics (SMD). Most related works on this topic focus on studying “effective learning rate" using “equilibrium" assumption, i.e. assuming weight norm has converge to a fixed value. However, their discussion on why equilibrium can be reached is either absent or unjustified. To clarify the mechanism behind, our work directly explores the cause of equilibrium, which should be regarded as a special state of SMD. Specifically, 1) we introduce the assumptions that can lead to equilibrium state in SMD, and prove equilibrium can be reached in a linear rate regime; 2) we propose “angular update" as a substitute for effective learning rate to depict the state of SMD, and derive the theoretical value of angular update in equilibrium state; 3) we verify our assumptions and theoretical results on various large-scale computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations. Furthermore, we provide intuitive interpretations, showing how the behavior of angular update in SMD affects the optimization of neural network, and yields unexpected phenomenon in practice. We believe our findings and theoretical results can deepen our understanding on current training techniques for deep neural network.

This record has no associated files available for download.

More information

Published date: 2021
Venue - Dates: 35th Conference on Neural Information Processing Systems, virtual, 2021-12-06 - 2021-12-14

Identifiers

Local EPrints ID: 486049
URI: http://eprints.soton.ac.uk/id/eprint/486049
PURE UUID: d43a468b-eb93-432a-9bc6-89f44c040e2f

Catalogue record

Date deposited: 08 Jan 2024 17:33
Last modified: 17 Mar 2024 06:41

Export record

Contributors

Author: Jian Sun
Author: Ruosi Wan
Author: Xiangyu Zhang
Author: Zhanxing Zhu
Editor: M. Ranzato
Editor: A. Beygelzimer
Editor: Y. Dauphin
Editor: P.S. Liang
Editor: J. Wortman Vaughan

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×