The University of Southampton
University of Southampton Institutional Repository

Analyzing the implicit bias of adversarial training from a generalized margin perspective

Analyzing the implicit bias of adversarial training from a generalized margin perspective
Analyzing the implicit bias of adversarial training from a generalized margin perspective
Adversarial training has been empirically demonstrated as an effective strategy to improve the robustness of deep neural networks (DNNs) against adversarial examples. However, the underlying reason of its effectiveness is still non-transparent. In this paper we conduct both extensive theoretical and empirical analysis on the implicit bias induced by adversarial training from a generalized margin perspective. Our results focus on adversarial training for homogeneous DNNs. In particular, (i) For deep linear networks with ℓp-norm perturbation, we show that weight matrices of adjacent layers get aligned and the converged parameters maximize the margin of adversarial examples, which can be further viewed as a generalized margin of the original dataset that can be achieved by an interpolation solution between ℓ2-SVM and ℓq-SVM where 1/p+1/q=1. (ii) For general homogeneous DNNs, including both linear and nonlinear ones, we investigate adversarial training with a variety of adversarial perturbations in a unified manner. Specifically, we show that the direction of the limit point of parameters converges to a KKT point of a constrained optimization problem that aims to maximize the margin for adversarial examples. Additionally, as an application of this general result for two special linear homogeneous DNNs, diagonal linear networks and linear convolutional networks, we show that adversarial training with ℓp-norm perturbation equivalently minimizes an interpolation norm that depends on the depth, the architecture, and the value of p in the predictor space. Extensive experiments are conducted to verify theoretical claims. Our results theoretically provide the basis for the longstanding folklore Madry et al. 2018 that adversarial training modifies the decision boundary by utilizing adversarial examples to improve robustness, and potentially provide insights for designing new robust training strategies.
1939-3539
8025-8039
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0

Zhu, Zhanxing (2025) Analyzing the implicit bias of adversarial training from a generalized margin perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47 (9), 8025-8039. (doi:10.1109/TPAMI.2025.3575618).

Record type: Article

Abstract

Adversarial training has been empirically demonstrated as an effective strategy to improve the robustness of deep neural networks (DNNs) against adversarial examples. However, the underlying reason of its effectiveness is still non-transparent. In this paper we conduct both extensive theoretical and empirical analysis on the implicit bias induced by adversarial training from a generalized margin perspective. Our results focus on adversarial training for homogeneous DNNs. In particular, (i) For deep linear networks with ℓp-norm perturbation, we show that weight matrices of adjacent layers get aligned and the converged parameters maximize the margin of adversarial examples, which can be further viewed as a generalized margin of the original dataset that can be achieved by an interpolation solution between ℓ2-SVM and ℓq-SVM where 1/p+1/q=1. (ii) For general homogeneous DNNs, including both linear and nonlinear ones, we investigate adversarial training with a variety of adversarial perturbations in a unified manner. Specifically, we show that the direction of the limit point of parameters converges to a KKT point of a constrained optimization problem that aims to maximize the margin for adversarial examples. Additionally, as an application of this general result for two special linear homogeneous DNNs, diagonal linear networks and linear convolutional networks, we show that adversarial training with ℓp-norm perturbation equivalently minimizes an interpolation norm that depends on the depth, the architecture, and the value of p in the predictor space. Extensive experiments are conducted to verify theoretical claims. Our results theoretically provide the basis for the longstanding folklore Madry et al. 2018 that adversarial training modifies the decision boundary by utilizing adversarial examples to improve robustness, and potentially provide insights for designing new robust training strategies.

Text
Analyzing_the_Implicit_Bias_of_Adversarial_Training_From_a_Generalized_Margin_Perspective - Version of Record
Restricted to Repository staff only
Request a copy

More information

e-pub ahead of print date: 2 June 2025
Published date: September 2025

Identifiers

Local EPrints ID: 509649
URI: http://eprints.soton.ac.uk/id/eprint/509649
ISSN: 1939-3539
PURE UUID: c9c73c63-0d8f-46b2-9feb-54ce7661bf66
ORCID for Zhanxing Zhu: ORCID iD orcid.org/0000-0002-2141-6553

Catalogue record

Date deposited: 27 Feb 2026 17:43
Last modified: 28 Feb 2026 03:14

Export record

Altmetrics

Contributors

Author: Zhanxing Zhu ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×