The University of Southampton
University of Southampton Institutional Repository

Memory-efficient gradient unrolling for large-scale Bi-level optimization

Memory-efficient gradient unrolling for large-scale Bi-level optimization
Memory-efficient gradient unrolling for large-scale Bi-level optimization
Bi-level optimizaiton (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization has become increasingly critical. Traditional gradient-based bi-level optimizaiton algorithms, due to their inherent characteristics, are ill-suited to meet the demands of large-scale applications. In this paper, we introduce Forward Gradient Unrolling with Forward Gradient, abbreviated as FGU, which achieves an unbiased stochastic approximation of the meta gradient for bi-level optimizaiton. FGU circumvents the memory and approximation issues associated with classical bi-level optimizaiton approaches, and delivers significantly more accurate gradient estimates than existing large-scale bi-level optimizaiton approaches. Additionally, FGU is inherently designed to support parallel computing, enabling it to effectively leverage large-scale distributed computing systems to achieve significant computational efficiency. In practice, FGU and other methods can be strategically placed at different stages of the training process to achieve a more cost-effective two-phase paradigm. Further, FGU is easy to implement within popular deep learning frameworks, and can be conveniently adapted to address more challenging zeroth-order bi-level optimizaiton scenarios. We provide a thorough convergence analysis and a comprehensive practical discussion for FGU, complemented by extensive empirical evaluations, showcasing its superior performance in diverse large-scale bi-level optimizaiton tasks.
Neural Information Processing Systems Foundation
Shen, Qianli
c539b159-03fb-4cc5-be0d-d7291d441660
Wang, Yezhen
3304e63f-d8b6-41c9-8da3-3f936c81563a
Yang, Zhouhao
f8e8df08-e86e-4896-943f-437f75b94cdb
Li, Xiang
0800cbf8-014a-4b52-91f7-228fb72ca9cd
Wang, Haonan
8f36a97b-5c4f-45c1-9c3b-b299a03cbd20
Zhang, Yang
3e5b454a-751c-46f6-864a-46c1ef1ffc32
Scarlett, Jonathan
0c8849f5-4fc1-41cd-9d71-d54d92868f0c
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Kawaguchi, Kenji
1877013a-ba3e-4490-a67c-57380a386f0b
Globerson, A.
Mackey, L.
Belgrave, D.
Fan, A.
Paquet, U.
Tomczak, J.
Zhang, C.
Shen, Qianli
c539b159-03fb-4cc5-be0d-d7291d441660
Wang, Yezhen
3304e63f-d8b6-41c9-8da3-3f936c81563a
Yang, Zhouhao
f8e8df08-e86e-4896-943f-437f75b94cdb
Li, Xiang
0800cbf8-014a-4b52-91f7-228fb72ca9cd
Wang, Haonan
8f36a97b-5c4f-45c1-9c3b-b299a03cbd20
Zhang, Yang
3e5b454a-751c-46f6-864a-46c1ef1ffc32
Scarlett, Jonathan
0c8849f5-4fc1-41cd-9d71-d54d92868f0c
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Kawaguchi, Kenji
1877013a-ba3e-4490-a67c-57380a386f0b
Globerson, A.
Mackey, L.
Belgrave, D.
Fan, A.
Paquet, U.
Tomczak, J.
Zhang, C.

Shen, Qianli, Wang, Yezhen, Yang, Zhouhao, Li, Xiang, Wang, Haonan, Zhang, Yang, Scarlett, Jonathan, Zhu, Zhanxing and Kawaguchi, Kenji (2024) Memory-efficient gradient unrolling for large-scale Bi-level optimization. Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J. and Zhang, C. (eds.) In Advances in Neural Information Processing Systems 37 (NeurIPS 2024). Neural Information Processing Systems Foundation. 31 pp .

Record type: Conference or Workshop Item (Paper)

Abstract

Bi-level optimizaiton (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization has become increasingly critical. Traditional gradient-based bi-level optimizaiton algorithms, due to their inherent characteristics, are ill-suited to meet the demands of large-scale applications. In this paper, we introduce Forward Gradient Unrolling with Forward Gradient, abbreviated as FGU, which achieves an unbiased stochastic approximation of the meta gradient for bi-level optimizaiton. FGU circumvents the memory and approximation issues associated with classical bi-level optimizaiton approaches, and delivers significantly more accurate gradient estimates than existing large-scale bi-level optimizaiton approaches. Additionally, FGU is inherently designed to support parallel computing, enabling it to effectively leverage large-scale distributed computing systems to achieve significant computational efficiency. In practice, FGU and other methods can be strategically placed at different stages of the training process to achieve a more cost-effective two-phase paradigm. Further, FGU is easy to implement within popular deep learning frameworks, and can be conveniently adapted to address more challenging zeroth-order bi-level optimizaiton scenarios. We provide a thorough convergence analysis and a comprehensive practical discussion for FGU, complemented by extensive empirical evaluations, showcasing its superior performance in diverse large-scale bi-level optimizaiton tasks.

This record has no associated files available for download.

More information

Accepted/In Press date: December 2024
Published date: 2024

Identifiers

Local EPrints ID: 500723
URI: http://eprints.soton.ac.uk/id/eprint/500723
PURE UUID: d3b0ba75-107b-4cfe-a28b-f33a00819809
ORCID for Zhanxing Zhu: ORCID iD orcid.org/0000-0002-2141-6553

Catalogue record

Date deposited: 12 May 2025 16:37
Last modified: 23 May 2025 02:10

Export record

Contributors

Author: Qianli Shen
Author: Yezhen Wang
Author: Zhouhao Yang
Author: Xiang Li
Author: Haonan Wang
Author: Yang Zhang
Author: Jonathan Scarlett
Author: Zhanxing Zhu ORCID iD
Author: Kenji Kawaguchi
Editor: A. Globerson
Editor: L. Mackey
Editor: D. Belgrave
Editor: A. Fan
Editor: U. Paquet
Editor: J. Tomczak
Editor: C. Zhang

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×