The University of Southampton
University of Southampton Institutional Repository

The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects

The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects
The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects
Understanding the behavior of stochastic gradient descent (SGD) in the context of deep neural networks has raised lots of concerns recently. Along this line, we study a general form of gradient based optimization dynamics with unbiased noise, which unifies SGD and standard Langevin dynamics. Through investigating this general optimization dynamics, we analyze the behavior of SGD on escaping from minima and its regularization effects. A novel indicator is derived to characterize the efficiency of escaping from minima through measuring the alignment of noise covariance and the curvature of loss function. Based on this indicator, two conditions arc established to show which type of noise structure is superior to isotropic noise in term of escaping efficiency. We further show that the anisotropic noise in SGD satisfies the two conditions, and thus helps to escape from sharp and poor minima effectively, towards more stable and flat minima that typically generalize well. We systematically design various experiments to verify the benefits of the anisotropic noise, compared with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics).
13199-13214
International Machine Learning Society
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Wu, Jingfeng
8b1664ba-36ea-4dd6-b3dd-450afb4b2f03
Yu, Bing
65160144-6e12-47d2-9a17-fce4c7d16499
Wu, Lei
ab544e1c-0c13-4820-b9a6-c063137a74eb
Ma, Jinwen
b0c0c37e-7c2b-4c86-87fb-ffbcb20b8dcd
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Wu, Jingfeng
8b1664ba-36ea-4dd6-b3dd-450afb4b2f03
Yu, Bing
65160144-6e12-47d2-9a17-fce4c7d16499
Wu, Lei
ab544e1c-0c13-4820-b9a6-c063137a74eb
Ma, Jinwen
b0c0c37e-7c2b-4c86-87fb-ffbcb20b8dcd

Zhu, Zhanxing, Wu, Jingfeng, Yu, Bing, Wu, Lei and Ma, Jinwen (2019) The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects. In, 36th International Conference on Machine Learning, ICML 2019. (International Conference on Machine Learning, ICML, 97) 36th International Conference on Machine Learning (ICML 2019) (09/06/19 - 15/06/19) International Machine Learning Society, pp. 13199-13214.

Record type: Book Section

Abstract

Understanding the behavior of stochastic gradient descent (SGD) in the context of deep neural networks has raised lots of concerns recently. Along this line, we study a general form of gradient based optimization dynamics with unbiased noise, which unifies SGD and standard Langevin dynamics. Through investigating this general optimization dynamics, we analyze the behavior of SGD on escaping from minima and its regularization effects. A novel indicator is derived to characterize the efficiency of escaping from minima through measuring the alignment of noise covariance and the curvature of loss function. Based on this indicator, two conditions arc established to show which type of noise structure is superior to isotropic noise in term of escaping efficiency. We further show that the anisotropic noise in SGD satisfies the two conditions, and thus helps to escape from sharp and poor minima effectively, towards more stable and flat minima that typically generalize well. We systematically design various experiments to verify the benefits of the anisotropic noise, compared with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics).

This record has no associated files available for download.

More information

Published date: June 2019
Venue - Dates: 36th International Conference on Machine Learning (ICML 2019), , Long Beach, United States, 2019-06-09 - 2019-06-15

Identifiers

Local EPrints ID: 486073
URI: http://eprints.soton.ac.uk/id/eprint/486073
PURE UUID: 7d5e0580-7b8c-4c39-941c-9081db0669f4

Catalogue record

Date deposited: 08 Jan 2024 17:50
Last modified: 17 Mar 2024 06:41

Export record

Contributors

Author: Zhanxing Zhu
Author: Jingfeng Wu
Author: Bing Yu
Author: Lei Wu
Author: Jinwen Ma

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×