The University of Southampton
University of Southampton Institutional Repository

Effects of momentum in implicit bias of gradient flow for diagonal linear networks

Effects of momentum in implicit bias of gradient flow for diagonal linear networks
Effects of momentum in implicit bias of gradient flow for diagonal linear networks
This paper targets on the regularization effect of momentum-based methods in regression settings and analyzes the popular diagonal linear networks to precisely characterize the implicit bias of continuous versions of heavy-ball (HB) and Nesterov's method of accelerated gradients (NAG). We show that, HB and NAG exhibit different implicit bias compared to GD for diagonal linear networks, which is different from the one for classic linear regression problem where momentum-based methods share the same implicit bias with GD. Specifically, the role of momentum in the implicit bias of GD is twofold: (a) HB and NAG induce extra initialization mitigation effects similar to SGD that are beneficial for generalization of sparse regression; (b) the implicit regularization effects of HB and NAG also depend on the initialization of gradients explicitly, which may not be benign for generalization. As a result, whether HB and NAG have better generalization properties than GD jointly depends on the aforementioned twofold effects determined by various parameters such as learning rate, momentum factor, and integral of gradients. Our findings highlight the potential beneficial role of momentum and can help understand its advantages in practice such as when it will lead to better generalization performance.
2374-3468
18
19242-19250
AAAI Press
Lyu, Bochen
fb1af04c-d0d6-490b-a238-73c1dd1aed2a
Wang, He
fd5533c4-4b79-44c7-a40e-977999539c5e
Wang, Zheng
3c6f18bb-fc19-48ae-ae40-3cec2c2054df
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Walsh, Toby
Shah, Julie
Kolter, Zico
Lyu, Bochen
fb1af04c-d0d6-490b-a238-73c1dd1aed2a
Wang, He
fd5533c4-4b79-44c7-a40e-977999539c5e
Wang, Zheng
3c6f18bb-fc19-48ae-ae40-3cec2c2054df
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Walsh, Toby
Shah, Julie
Kolter, Zico

Lyu, Bochen, Wang, He, Wang, Zheng and Zhu, Zhanxing (2025) Effects of momentum in implicit bias of gradient flow for diagonal linear networks. Walsh, Toby, Shah, Julie and Kolter, Zico (eds.) In Proceedings of the 39th Annual AAAI Conference on Artificial Intelligennce. vol. 39, AAAI Press. pp. 19242-19250 . (doi:10.1609/aaai.v39i18.34118).

Record type: Conference or Workshop Item (Paper)

Abstract

This paper targets on the regularization effect of momentum-based methods in regression settings and analyzes the popular diagonal linear networks to precisely characterize the implicit bias of continuous versions of heavy-ball (HB) and Nesterov's method of accelerated gradients (NAG). We show that, HB and NAG exhibit different implicit bias compared to GD for diagonal linear networks, which is different from the one for classic linear regression problem where momentum-based methods share the same implicit bias with GD. Specifically, the role of momentum in the implicit bias of GD is twofold: (a) HB and NAG induce extra initialization mitigation effects similar to SGD that are beneficial for generalization of sparse regression; (b) the implicit regularization effects of HB and NAG also depend on the initialization of gradients explicitly, which may not be benign for generalization. As a result, whether HB and NAG have better generalization properties than GD jointly depends on the aforementioned twofold effects determined by various parameters such as learning rate, momentum factor, and integral of gradients. Our findings highlight the potential beneficial role of momentum and can help understand its advantages in practice such as when it will lead to better generalization performance.

Text
34118-Article Text-38186-1-2-20250410 - Version of Record
Restricted to Repository staff only
Request a copy

More information

Published date: 11 April 2025

Identifiers

Local EPrints ID: 507349
URI: http://eprints.soton.ac.uk/id/eprint/507349
ISSN: 2374-3468
PURE UUID: c5308ec5-a7c7-4676-9a80-43e30797f029
ORCID for Zhanxing Zhu: ORCID iD orcid.org/0000-0002-2141-6553

Catalogue record

Date deposited: 04 Dec 2025 18:00
Last modified: 05 Dec 2025 03:05

Export record

Altmetrics

Contributors

Author: Bochen Lyu
Author: He Wang
Author: Zheng Wang
Author: Zhanxing Zhu ORCID iD
Editor: Toby Walsh
Editor: Julie Shah
Editor: Zico Kolter

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×