Spherical motion dynamics: learning dynamics of normalized neural network using SGD and weight decay
Spherical motion dynamics: learning dynamics of normalized neural network using SGD and weight decay
In this paper, we comprehensively reveal the learning dynamics of normalized neural network using Stochastic Gradient Descent (with momentum) and Weight Decay (WD), named as Spherical Motion Dynamics (SMD). Most related works on this topic focus on studying “effective learning rate" using “equilibrium" assumption, i.e. assuming weight norm has converge to a fixed value. However, their discussion on why equilibrium can be reached is either absent or unjustified. To clarify the mechanism behind, our work directly explores the cause of equilibrium, which should be regarded as a special state of SMD. Specifically, 1) we introduce the assumptions that can lead to equilibrium state in SMD, and prove equilibrium can be reached in a linear rate regime; 2) we propose “angular update" as a substitute for effective learning rate to depict the state of SMD, and derive the theoretical value of angular update in equilibrium state; 3) we verify our assumptions and theoretical results on various large-scale computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations. Furthermore, we provide intuitive interpretations, showing how the behavior of angular update in SMD affects the optimization of neural network, and yields unexpected phenomenon in practice. We believe our findings and theoretical results can deepen our understanding on current training techniques for deep neural network.
6380-6391
Neural Information Processing Systems Foundation
Sun, Jian
bd3cd7a6-ce5d-4a71-bfaa-c1d295321354
Wan, Ruosi
d6cdf439-669f-4e34-b736-90fb87044a80
Zhang, Xiangyu
f450adff-04de-4e27-87e0-e3c570f5234f
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
2021
Sun, Jian
bd3cd7a6-ce5d-4a71-bfaa-c1d295321354
Wan, Ruosi
d6cdf439-669f-4e34-b736-90fb87044a80
Zhang, Xiangyu
f450adff-04de-4e27-87e0-e3c570f5234f
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Sun, Jian, Wan, Ruosi, Zhang, Xiangyu and Zhu, Zhanxing
(2021)
Spherical motion dynamics: learning dynamics of normalized neural network using SGD and weight decay.
In,
Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P.S. and Wortman Vaughan, J.
(eds.)
Advances in Neural Information Processing Systems 34.
(Advances in Neural Information Processing Systems, 34)
35th Conference on Neural Information Processing Systems (06/12/21 - 14/12/21)
Neural Information Processing Systems Foundation, .
Record type:
Book Section
Abstract
In this paper, we comprehensively reveal the learning dynamics of normalized neural network using Stochastic Gradient Descent (with momentum) and Weight Decay (WD), named as Spherical Motion Dynamics (SMD). Most related works on this topic focus on studying “effective learning rate" using “equilibrium" assumption, i.e. assuming weight norm has converge to a fixed value. However, their discussion on why equilibrium can be reached is either absent or unjustified. To clarify the mechanism behind, our work directly explores the cause of equilibrium, which should be regarded as a special state of SMD. Specifically, 1) we introduce the assumptions that can lead to equilibrium state in SMD, and prove equilibrium can be reached in a linear rate regime; 2) we propose “angular update" as a substitute for effective learning rate to depict the state of SMD, and derive the theoretical value of angular update in equilibrium state; 3) we verify our assumptions and theoretical results on various large-scale computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations. Furthermore, we provide intuitive interpretations, showing how the behavior of angular update in SMD affects the optimization of neural network, and yields unexpected phenomenon in practice. We believe our findings and theoretical results can deepen our understanding on current training techniques for deep neural network.
This record has no associated files available for download.
More information
Published date: 2021
Venue - Dates:
35th Conference on Neural Information Processing Systems, virtual, 2021-12-06 - 2021-12-14
Identifiers
Local EPrints ID: 486049
URI: http://eprints.soton.ac.uk/id/eprint/486049
PURE UUID: d43a468b-eb93-432a-9bc6-89f44c040e2f
Catalogue record
Date deposited: 08 Jan 2024 17:33
Last modified: 17 Mar 2024 06:41
Export record
Contributors
Author:
Jian Sun
Author:
Ruosi Wan
Author:
Xiangyu Zhang
Author:
Zhanxing Zhu
Editor:
M. Ranzato
Editor:
A. Beygelzimer
Editor:
Y. Dauphin
Editor:
P.S. Liang
Editor:
J. Wortman Vaughan
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics