Memory-efficient gradient unrolling for large-scale Bi-level optimization

Bi-level optimizaiton (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization has become increasingly critical. Traditional gradient-based bi-level optimizaiton algorithms, due to their inherent characteristics, are ill-suited to meet the demands of large-scale applications. In this paper, we introduce Forward Gradient Unrolling with Forward Gradient, abbreviated as FGU, which achieves an unbiased stochastic approximation of the meta gradient for bi-level optimizaiton. FGU circumvents the memory and approximation issues associated with classical bi-level optimizaiton approaches, and delivers significantly more accurate gradient estimates than existing large-scale bi-level optimizaiton approaches. Additionally, FGU is inherently designed to support parallel computing, enabling it to effectively leverage large-scale distributed computing systems to achieve significant computational efficiency. In practice, FGU and other methods can be strategically placed at different stages of the training process to achieve a more cost-effective two-phase paradigm. Further, FGU is easy to implement within popular deep learning frameworks, and can be conveniently adapted to address more challenging zeroth-order bi-level optimizaiton scenarios. We provide a thorough convergence analysis and a comprehensive practical discussion for FGU, complemented by extensive empirical evaluations, showcasing its superior performance in diverse large-scale bi-level optimizaiton tasks.

Neural Information Processing Systems Foundation

Shen, Qianli

c539b159-03fb-4cc5-be0d-d7291d441660

Wang, Yezhen

3304e63f-d8b6-41c9-8da3-3f936c81563a

Yang, Zhouhao

f8e8df08-e86e-4896-943f-437f75b94cdb

Li, Xiang

0800cbf8-014a-4b52-91f7-228fb72ca9cd

Wang, Haonan

8f36a97b-5c4f-45c1-9c3b-b299a03cbd20

Zhang, Yang

3e5b454a-751c-46f6-864a-46c1ef1ffc32

Scarlett, Jonathan

0c8849f5-4fc1-41cd-9d71-d54d92868f0c

Zhu, Zhanxing

e55e7385-8ba2-4a85-8bae-e00defb7d7f0

Kawaguchi, Kenji

1877013a-ba3e-4490-a67c-57380a386f0b

Globerson, A.

Mackey, L.

Belgrave, D.

Fan, A.

Paquet, U.

Tomczak, J.

Zhang, C.

2024

Shen, Qianli

c539b159-03fb-4cc5-be0d-d7291d441660

Wang, Yezhen

3304e63f-d8b6-41c9-8da3-3f936c81563a

Yang, Zhouhao

f8e8df08-e86e-4896-943f-437f75b94cdb

Li, Xiang

0800cbf8-014a-4b52-91f7-228fb72ca9cd

Wang, Haonan

8f36a97b-5c4f-45c1-9c3b-b299a03cbd20

Zhang, Yang

3e5b454a-751c-46f6-864a-46c1ef1ffc32

Scarlett, Jonathan

0c8849f5-4fc1-41cd-9d71-d54d92868f0c

Zhu, Zhanxing

e55e7385-8ba2-4a85-8bae-e00defb7d7f0

Kawaguchi, Kenji

1877013a-ba3e-4490-a67c-57380a386f0b

Globerson, A.

Mackey, L.

Belgrave, D.

Fan, A.

Paquet, U.

Tomczak, J.

Zhang, C.

Shen, Qianli, Wang, Yezhen, Yang, Zhouhao, Li, Xiang, Wang, Haonan, Zhang, Yang, Scarlett, Jonathan, Zhu, Zhanxing and Kawaguchi, Kenji (2024) Memory-efficient gradient unrolling for large-scale Bi-level optimization. Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J. and Zhang, C. (eds.) In Advances in Neural Information Processing Systems 37 (NeurIPS 2024). Neural Information Processing Systems Foundation. 31 pp .

Record type: Conference or Workshop Item (Paper)

Abstract

This record has no associated files available for download.

More information

Accepted/In Press date: December 2024

Published date: 2024

Related URLs:

https://openreview.net/pdf?id=MI8Z9gutIn

Learn more about Vision, Learning and Control research

Identifiers

Local EPrints ID: 500723

URI: http://eprints.soton.ac.uk/id/eprint/500723

PURE UUID: d3b0ba75-107b-4cfe-a28b-f33a00819809

ORCID for Zhanxing Zhu:

orcid.org/0000-0002-2141-6553

Catalogue record

Date deposited: 12 May 2025 16:37

Last modified: 23 May 2025 02:10

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Qianli Shen

Author: Yezhen Wang

Author: Zhouhao Yang

Author: Xiang Li

Author: Haonan Wang

Author: Yang Zhang

Author: Jonathan Scarlett

Author: Zhanxing Zhu

Author: Kenji Kawaguchi

Editor: A. Globerson

Editor: L. Mackey

Editor: D. Belgrave

Editor: A. Fan

Editor: U. Paquet

Editor: J. Tomczak

Editor: C. Zhang

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information