On the noisy gradient descent that generalizes as SGD
On the noisy gradient descent that generalizes as SGD
The gradient noise of SGD is considered to play a central role in the observed strong generalization abilities of deep learning. While past studies confirm that the magnitude and covariance structure of gradient noise are critical for regularization, it remains unclear whether or not the class of noise distributions is important. In this work we provide negative results by showing that noises in classes different from the SGD noise can also effectively regularize gradient descent. Our finding is based on a novel observation on the structure of the SGD noise: it is the multiplication of the gradient matrix and a sampling noise that arises from the mini-batch sampling procedure. Moreover, the sampling noises unify two kinds of gradient regularizing noises that belong to the Gaussian class: the one using (scaled) Fisher as covariance and the one using the gradient covariance of SGD as covariance. Finally, thanks to the flexibility of choosing noise class, an algorithm is proposed to perform noisy gradient descent that generalizes well, the variant of which even benefits large batch SGD training without hurting generalization.
10298-10307
International Machine Learning Society
Wu, J.
8b1664ba-36ea-4dd6-b3dd-450afb4b2f03
Hu, W.
98cd1212-178c-43f0-8153-c4dcc601adfa
Xiong, H.
ce4ad3c5-7887-4830-941c-02e593f20dae
Huan, J.
0ea4757d-fe12-44b7-9928-6f94e70117ae
Braverman, V.
16f0f548-b20e-425e-a55c-a16cf47d939c
Zhu, Z.
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
2020
Wu, J.
8b1664ba-36ea-4dd6-b3dd-450afb4b2f03
Hu, W.
98cd1212-178c-43f0-8153-c4dcc601adfa
Xiong, H.
ce4ad3c5-7887-4830-941c-02e593f20dae
Huan, J.
0ea4757d-fe12-44b7-9928-6f94e70117ae
Braverman, V.
16f0f548-b20e-425e-a55c-a16cf47d939c
Zhu, Z.
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Wu, J., Hu, W., Xiong, H., Huan, J., Braverman, V. and Zhu, Z.
(2020)
On the noisy gradient descent that generalizes as SGD.
In,
Daumé III, Hal and Singh, Aarti
(eds.)
37th International Conference on Machine Learning (ICML 2020).
(Proceedings of Machine Learning Research, 119)
37th International Conference on Machine Learning (13/07/20 - 18/07/20)
International Machine Learning Society, .
Record type:
Book Section
Abstract
The gradient noise of SGD is considered to play a central role in the observed strong generalization abilities of deep learning. While past studies confirm that the magnitude and covariance structure of gradient noise are critical for regularization, it remains unclear whether or not the class of noise distributions is important. In this work we provide negative results by showing that noises in classes different from the SGD noise can also effectively regularize gradient descent. Our finding is based on a novel observation on the structure of the SGD noise: it is the multiplication of the gradient matrix and a sampling noise that arises from the mini-batch sampling procedure. Moreover, the sampling noises unify two kinds of gradient regularizing noises that belong to the Gaussian class: the one using (scaled) Fisher as covariance and the one using the gradient covariance of SGD as covariance. Finally, thanks to the flexibility of choosing noise class, an algorithm is proposed to perform noisy gradient descent that generalizes well, the variant of which even benefits large batch SGD training without hurting generalization.
This record has no associated files available for download.
More information
Published date: 2020
Venue - Dates:
37th International Conference on Machine Learning, virtual, 2020-07-13 - 2020-07-18
Identifiers
Local EPrints ID: 486056
URI: http://eprints.soton.ac.uk/id/eprint/486056
PURE UUID: 86fba35f-6744-49ff-a1a4-b38d7233898e
Catalogue record
Date deposited: 08 Jan 2024 17:35
Last modified: 17 Jan 2024 19:37
Export record
Contributors
Author:
J. Wu
Author:
W. Hu
Author:
H. Xiong
Author:
J. Huan
Author:
V. Braverman
Author:
Z. Zhu
Editor:
Hal Daumé III
Editor:
Aarti Singh
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics