The University of Southampton
University of Southampton Institutional Repository

Implicit bias of (stochastic) gradient descent for rank-1 linear neural network

Implicit bias of (stochastic) gradient descent for rank-1 linear neural network
Implicit bias of (stochastic) gradient descent for rank-1 linear neural network
Studying the implicit bias of gradient descent (GD) and stochastic gradient descent (SGD) is critical to unveil the underlying mechanism of deep learning. Unfortunately, even for standard linear networks in regression setting, a comprehensive characterization of the implicit bias is still an open problem. This paper proposes to investigate a new proxy model of standard linear network, rank-1 linear network, where each weight matrix is parameterized as a rank-1 form. For over-parameterized regression problem, we precisely analyze the implicit bias of GD and SGD---by identifying a “potential” function such that GD converges to its minimizer constrained by zero training error (i.e., interpolation solution), and further characterizing the role of the noise introduced by SGD in perturbing the form of this potential. Our results explicitly connect the depth of the network and the initialization with the implicit bias of GD and SGD. Furthermore, we emphasize a new implicit bias of SGD jointly induced by stochasticity and over-parameterization, which can reduce the dependence of the SGD's solution on the initialization. Our findings regarding the implicit bias are different from that of a recently popular model, the diagonal linear network. We highlight that the induced bias of our rank-1 model is more consistent with standard linear network while the diagonal one is not. This suggests that the proposed rank-1 linear network might be a plausible proxy for standard linear net.
Neural Information Processing Systems Foundation
Lyu, Bochen
2d571283-73b1-4741-9798-15f9d144f7a6
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Lyu, Bochen
2d571283-73b1-4741-9798-15f9d144f7a6
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0

Lyu, Bochen and Zhu, Zhanxing (2023) Implicit bias of (stochastic) gradient descent for rank-1 linear neural network. In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS). Neural Information Processing Systems Foundation. 24 pp .

Record type: Conference or Workshop Item (Paper)

Abstract

Studying the implicit bias of gradient descent (GD) and stochastic gradient descent (SGD) is critical to unveil the underlying mechanism of deep learning. Unfortunately, even for standard linear networks in regression setting, a comprehensive characterization of the implicit bias is still an open problem. This paper proposes to investigate a new proxy model of standard linear network, rank-1 linear network, where each weight matrix is parameterized as a rank-1 form. For over-parameterized regression problem, we precisely analyze the implicit bias of GD and SGD---by identifying a “potential” function such that GD converges to its minimizer constrained by zero training error (i.e., interpolation solution), and further characterizing the role of the noise introduced by SGD in perturbing the form of this potential. Our results explicitly connect the depth of the network and the initialization with the implicit bias of GD and SGD. Furthermore, we emphasize a new implicit bias of SGD jointly induced by stochasticity and over-parameterization, which can reduce the dependence of the SGD's solution on the initialization. Our findings regarding the implicit bias are different from that of a recently popular model, the diagonal linear network. We highlight that the induced bias of our rank-1 model is more consistent with standard linear network while the diagonal one is not. This suggests that the proposed rank-1 linear network might be a plausible proxy for standard linear net.

This record has no associated files available for download.

More information

Published date: 21 September 2023

Identifiers

Local EPrints ID: 486326
URI: http://eprints.soton.ac.uk/id/eprint/486326
PURE UUID: 339dd90a-620c-4fe1-bcd2-c55ecaba09e4

Catalogue record

Date deposited: 17 Jan 2024 19:41
Last modified: 09 Apr 2024 22:02

Export record

Contributors

Author: Bochen Lyu
Author: Zhanxing Zhu

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×