Effects of momentum in implicit bias of gradient flow for diagonal linear networks
Effects of momentum in implicit bias of gradient flow for diagonal linear networks
This paper targets on the regularization effect of momentum-based methods in regression settings and analyzes the popular diagonal linear networks to precisely characterize the implicit bias of continuous versions of heavy-ball (HB) and Nesterov's method of accelerated gradients (NAG). We show that, HB and NAG exhibit different implicit bias compared to GD for diagonal linear networks, which is different from the one for classic linear regression problem where momentum-based methods share the same implicit bias with GD. Specifically, the role of momentum in the implicit bias of GD is twofold: (a) HB and NAG induce extra initialization mitigation effects similar to SGD that are beneficial for generalization of sparse regression; (b) the implicit regularization effects of HB and NAG also depend on the initialization of gradients explicitly, which may not be benign for generalization. As a result, whether HB and NAG have better generalization properties than GD jointly depends on the aforementioned twofold effects determined by various parameters such as learning rate, momentum factor, and integral of gradients. Our findings highlight the potential beneficial role of momentum and can help understand its advantages in practice such as when it will lead to better generalization performance.
19242-19250
Lyu, Bochen
fb1af04c-d0d6-490b-a238-73c1dd1aed2a
Wang, He
fd5533c4-4b79-44c7-a40e-977999539c5e
Wang, Zheng
3c6f18bb-fc19-48ae-ae40-3cec2c2054df
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
11 April 2025
Lyu, Bochen
fb1af04c-d0d6-490b-a238-73c1dd1aed2a
Wang, He
fd5533c4-4b79-44c7-a40e-977999539c5e
Wang, Zheng
3c6f18bb-fc19-48ae-ae40-3cec2c2054df
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Lyu, Bochen, Wang, He, Wang, Zheng and Zhu, Zhanxing
(2025)
Effects of momentum in implicit bias of gradient flow for diagonal linear networks.
Walsh, Toby, Shah, Julie and Kolter, Zico
(eds.)
In Proceedings of the 39th Annual AAAI Conference on Artificial Intelligennce.
vol. 39,
AAAI Press.
.
(doi:10.1609/aaai.v39i18.34118).
Record type:
Conference or Workshop Item
(Paper)
Abstract
This paper targets on the regularization effect of momentum-based methods in regression settings and analyzes the popular diagonal linear networks to precisely characterize the implicit bias of continuous versions of heavy-ball (HB) and Nesterov's method of accelerated gradients (NAG). We show that, HB and NAG exhibit different implicit bias compared to GD for diagonal linear networks, which is different from the one for classic linear regression problem where momentum-based methods share the same implicit bias with GD. Specifically, the role of momentum in the implicit bias of GD is twofold: (a) HB and NAG induce extra initialization mitigation effects similar to SGD that are beneficial for generalization of sparse regression; (b) the implicit regularization effects of HB and NAG also depend on the initialization of gradients explicitly, which may not be benign for generalization. As a result, whether HB and NAG have better generalization properties than GD jointly depends on the aforementioned twofold effects determined by various parameters such as learning rate, momentum factor, and integral of gradients. Our findings highlight the potential beneficial role of momentum and can help understand its advantages in practice such as when it will lead to better generalization performance.
Text
34118-Article Text-38186-1-2-20250410
- Version of Record
Restricted to Repository staff only
Request a copy
More information
Published date: 11 April 2025
Identifiers
Local EPrints ID: 507349
URI: http://eprints.soton.ac.uk/id/eprint/507349
ISSN: 2374-3468
PURE UUID: c5308ec5-a7c7-4676-9a80-43e30797f029
Catalogue record
Date deposited: 04 Dec 2025 18:00
Last modified: 05 Dec 2025 03:05
Export record
Altmetrics
Contributors
Author:
Bochen Lyu
Author:
He Wang
Author:
Zheng Wang
Author:
Zhanxing Zhu
Editor:
Toby Walsh
Editor:
Julie Shah
Editor:
Zico Kolter
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics