Implicit bias of adversarial training for deep neural networks
Implicit bias of adversarial training for deep neural networks
We provide theoretical understandings of the implicit bias imposed by adversarial training for homogeneous deep neural networks without any explicit regularization. In particular, for deep linear networks adversarially trained by gradient descent on a linearly separable dataset, we prove that the direction of the product of weight matrices converges to the direction of the max-margin solution of the original dataset. Furthermore, we generalize this result to the case of adversarial training for non-linear homogeneous deep neural networks without the linear separability of the dataset. We show that, when the neural network is adversarially trained with ℓ2 or ℓ∞ FGSM, FGM and PGD perturbations, the direction of the limit point of normalized parameters of the network along the trajectory of the gradient flow converges to a KKT point of a constrained optimization problem that aims to maximize the margin for adversarial examples. Our results theoretically justify the longstanding conjecture that adversarial training modifies the decision boundary by utilizing adversarial examples to improve robustness, and potentially provides insights for designing new robust training strategies.
Lyu, Bochen
2d571283-73b1-4741-9798-15f9d144f7a6
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
25 April 2022
Lyu, Bochen
2d571283-73b1-4741-9798-15f9d144f7a6
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Lyu, Bochen and Zhu, Zhanxing
(2022)
Implicit bias of adversarial training for deep neural networks.
10th International Conference on Learning Representations, ICLR 2022, , Virtual, Online.
25 - 29 Apr 2022.
Record type:
Conference or Workshop Item
(Paper)
Abstract
We provide theoretical understandings of the implicit bias imposed by adversarial training for homogeneous deep neural networks without any explicit regularization. In particular, for deep linear networks adversarially trained by gradient descent on a linearly separable dataset, we prove that the direction of the product of weight matrices converges to the direction of the max-margin solution of the original dataset. Furthermore, we generalize this result to the case of adversarial training for non-linear homogeneous deep neural networks without the linear separability of the dataset. We show that, when the neural network is adversarially trained with ℓ2 or ℓ∞ FGSM, FGM and PGD perturbations, the direction of the limit point of normalized parameters of the network along the trajectory of the gradient flow converges to a KKT point of a constrained optimization problem that aims to maximize the margin for adversarial examples. Our results theoretically justify the longstanding conjecture that adversarial training modifies the decision boundary by utilizing adversarial examples to improve robustness, and potentially provides insights for designing new robust training strategies.
This record has no associated files available for download.
More information
Published date: 25 April 2022
Additional Information:
Funding Information: this project is supported by Beijing Nova Program (No. 202072) from Beijing Municipal Science Technology Commission.
Venue - Dates:
10th International Conference on Learning Representations, ICLR 2022, , Virtual, Online, 2022-04-25 - 2022-04-29
Identifiers
Local EPrints ID: 486052
URI: http://eprints.soton.ac.uk/id/eprint/486052
PURE UUID: 32a94e35-7781-4df8-a4ad-06cf15877060
Catalogue record
Date deposited: 08 Jan 2024 17:34
Last modified: 17 Mar 2024 13:42
Export record
Contributors
Author:
Bochen Lyu
Author:
Zhanxing Zhu
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics