The University of Southampton
University of Southampton Institutional Repository

Interpreting adversarially trained convolutional neural networks

Interpreting adversarially trained convolutional neural networks
Interpreting adversarially trained convolutional neural networks
We attempt to interpret how adversarially trained convolutional neural networks (AT-CNNs) recognize objects. We design systematic approaches to interpret AT-CNNs in both qualitative and quantitative ways and compare them with normally trained models. Surprisingly, we find that adversarial training alleviates the texture bias of standard CNNs when trained on object recognition tasks, and helps CNNs learn a more shape-biased representation. We validate our hypothesis from two aspects. First, we compare the salience maps of AT-CNNs and standard CNNs on clean images and images under different transformations. The comparison could visually show that the prediction of the two types of CNNs is sensitive to dramatically different types of features. Second, to achieve quantitative verification, we construct additional test datasets that destroy either textures or shapes, such as style-transferred version of clean data, saturated images and patch-shuffled ones, and then evaluate the classification accuracy of AT-CNNs and normal CNNs on these datasets. Our findings shed some light on why AT-CNNs are more robust than those normally trained ones and contribute to a better understanding of adversarial training over CNNs from an interpretation perspective.
12951-12966
International Machine Learning Society
Zhang, Tianyuan
4554f1cd-2437-4f14-a366-482b3e4e060a
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0
Zhang, Tianyuan
4554f1cd-2437-4f14-a366-482b3e4e060a
Zhu, Zhanxing
e55e7385-8ba2-4a85-8bae-e00defb7d7f0

Zhang, Tianyuan and Zhu, Zhanxing (2019) Interpreting adversarially trained convolutional neural networks. In, 36th International Conference on Machine Learning (ICML 2019_. (Proceedings of Machine Learning Research, 97) 36th International Conference on Machine Learning (09/06/19 - 15/06/19) International Machine Learning Society, pp. 12951-12966.

Record type: Book Section

Abstract

We attempt to interpret how adversarially trained convolutional neural networks (AT-CNNs) recognize objects. We design systematic approaches to interpret AT-CNNs in both qualitative and quantitative ways and compare them with normally trained models. Surprisingly, we find that adversarial training alleviates the texture bias of standard CNNs when trained on object recognition tasks, and helps CNNs learn a more shape-biased representation. We validate our hypothesis from two aspects. First, we compare the salience maps of AT-CNNs and standard CNNs on clean images and images under different transformations. The comparison could visually show that the prediction of the two types of CNNs is sensitive to dramatically different types of features. Second, to achieve quantitative verification, we construct additional test datasets that destroy either textures or shapes, such as style-transferred version of clean data, saturated images and patch-shuffled ones, and then evaluate the classification accuracy of AT-CNNs and normal CNNs on these datasets. Our findings shed some light on why AT-CNNs are more robust than those normally trained ones and contribute to a better understanding of adversarial training over CNNs from an interpretation perspective.

This record has no associated files available for download.

More information

Published date: 2019
Venue - Dates: 36th International Conference on Machine Learning, Long Beach Convention Center, Long Beach, United States, 2019-06-09 - 2019-06-15

Identifiers

Local EPrints ID: 486048
URI: http://eprints.soton.ac.uk/id/eprint/486048
PURE UUID: 01bda4cc-4114-491e-9dc4-1fa0e2693e4b

Catalogue record

Date deposited: 08 Jan 2024 17:33
Last modified: 17 Mar 2024 06:41

Export record

Contributors

Author: Tianyuan Zhang
Author: Zhanxing Zhu

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×