The University of Southampton
University of Southampton Institutional Repository

On the noisy gradient descent that generalizes as SGD

On the noisy gradient descent that generalizes as SGD
On the noisy gradient descent that generalizes as SGD
The gradient noise of SGD is considered to play a central role in the observed strong generalization abilities of deep learning. While past studies confirm that the magnitude and covariance structure of gradient noise are critical for regularization, it remains unclear whether or not the class of noise distributions is important. In this work we provide negative results by showing that noises in classes different from the SGD noise can also effectively regularize gradient descent. Our finding is based on a novel observation on the structure of the SGD noise: it is the multiplication of the gradient matrix and a sampling noise that arises from the mini-batch sampling procedure. Moreover, the sampling noises unify two kinds of gradient regularizing noises that belong to the Gaussian class: the one using (scaled) Fisher as covariance and the one using the gradient covariance of SGD as covariance. Finally, thanks to the flexibility of choosing noise class, an algorithm is proposed to perform noisy gradient descent that generalizes well, the variant of which even benefits large batch SGD training without hurting generalization.
10298-10307
International Machine Learning Society
Wu, J.
8b1664ba-36ea-4dd6-b3dd-450afb4b2f03
Hu, W.
98cd1212-178c-43f0-8153-c4dcc601adfa
Xiong, H.
ce4ad3c5-7887-4830-941c-02e593f20dae
Huan, J.
0ea4757d-fe12-44b7-9928-6f94e70117ae
Braverman, V.
16f0f548-b20e-425e-a55c-a16cf47d939c
Zhu, Z.
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Daumé III, Hal
Singh, Aarti
Wu, J.
8b1664ba-36ea-4dd6-b3dd-450afb4b2f03
Hu, W.
98cd1212-178c-43f0-8153-c4dcc601adfa
Xiong, H.
ce4ad3c5-7887-4830-941c-02e593f20dae
Huan, J.
0ea4757d-fe12-44b7-9928-6f94e70117ae
Braverman, V.
16f0f548-b20e-425e-a55c-a16cf47d939c
Zhu, Z.
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Daumé III, Hal
Singh, Aarti

Wu, J., Hu, W., Xiong, H., Huan, J., Braverman, V. and Zhu, Z. (2020) On the noisy gradient descent that generalizes as SGD. In, Daumé III, Hal and Singh, Aarti (eds.) 37th International Conference on Machine Learning (ICML 2020). (Proceedings of Machine Learning Research, 119) 37th International Conference on Machine Learning (13/07/20 - 18/07/20) International Machine Learning Society, pp. 10298-10307.

Record type: Book Section

Abstract

The gradient noise of SGD is considered to play a central role in the observed strong generalization abilities of deep learning. While past studies confirm that the magnitude and covariance structure of gradient noise are critical for regularization, it remains unclear whether or not the class of noise distributions is important. In this work we provide negative results by showing that noises in classes different from the SGD noise can also effectively regularize gradient descent. Our finding is based on a novel observation on the structure of the SGD noise: it is the multiplication of the gradient matrix and a sampling noise that arises from the mini-batch sampling procedure. Moreover, the sampling noises unify two kinds of gradient regularizing noises that belong to the Gaussian class: the one using (scaled) Fisher as covariance and the one using the gradient covariance of SGD as covariance. Finally, thanks to the flexibility of choosing noise class, an algorithm is proposed to perform noisy gradient descent that generalizes well, the variant of which even benefits large batch SGD training without hurting generalization.

This record has no associated files available for download.

More information

Published date: 2020
Venue - Dates: 37th International Conference on Machine Learning, virtual, 2020-07-13 - 2020-07-18

Identifiers

Local EPrints ID: 486056
URI: http://eprints.soton.ac.uk/id/eprint/486056
PURE UUID: 86fba35f-6744-49ff-a1a4-b38d7233898e

Catalogue record

Date deposited: 08 Jan 2024 17:35
Last modified: 17 Jan 2024 19:37

Export record

Contributors

Author: J. Wu
Author: W. Hu
Author: H. Xiong
Author: J. Huan
Author: V. Braverman
Author: Z. Zhu
Editor: Hal Daumé III
Editor: Aarti Singh

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×