Protecting publicly available data with machine learning shortcuts
Protecting publicly available data with machine learning shortcuts
Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability. Such shortcuts are insidious because they go unnoticed due to good in-domain test performance. In this paper, we explore the influence of different shortcuts and show that even simple shortcuts are difficult to detect by explainable AI methods. We then exploit this fact and design an approach to defend online databases against crawlers: providers such as dating platforms, clothing manufacturers, or used car dealers have to deal with a professionalized crawling industry that grabs and resells data points on a large scale. We show that a deterrent can be created by deliberately adding ML shortcuts. Such augmented datasets are then unusable for ML use cases, which deters crawlers and the unauthorized use of data from the internet. Using real-world data from three use cases, we show that the proposed approach renders such collected data unusable, while the shortcut is at the same time difficult to notice in human perception. Thus, our proposed approach can serve as a proactive protection against illegitimate data crawling.
Müller, Nicolas M.
e054cb2d-3ad5-4674-b44e-406a6c2c1dfe
Burgert, Maximilian
e3dbc52a-6fcc-4de3-af3b-35db5e8b1e8c
Debus, Pascal
77adedc3-fcf2-468a-beab-106ae07156fa
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Sperl, Philip
2d9a03d7-ae76-4c3a-bf9e-96d3fb99560d
Böttinger, Konstantin
ec031c04-8af1-411a-871b-e31201458053
20 November 2023
Müller, Nicolas M.
e054cb2d-3ad5-4674-b44e-406a6c2c1dfe
Burgert, Maximilian
e3dbc52a-6fcc-4de3-af3b-35db5e8b1e8c
Debus, Pascal
77adedc3-fcf2-468a-beab-106ae07156fa
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Sperl, Philip
2d9a03d7-ae76-4c3a-bf9e-96d3fb99560d
Böttinger, Konstantin
ec031c04-8af1-411a-871b-e31201458053
Müller, Nicolas M., Burgert, Maximilian, Debus, Pascal, Williams, Jennifer, Sperl, Philip and Böttinger, Konstantin
(2023)
Protecting publicly available data with machine learning shortcuts.
British Machine Vision Conference 2023, , Aberdeen, United Kingdom.
20 - 24 Nov 2023.
12 pp
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability. Such shortcuts are insidious because they go unnoticed due to good in-domain test performance. In this paper, we explore the influence of different shortcuts and show that even simple shortcuts are difficult to detect by explainable AI methods. We then exploit this fact and design an approach to defend online databases against crawlers: providers such as dating platforms, clothing manufacturers, or used car dealers have to deal with a professionalized crawling industry that grabs and resells data points on a large scale. We show that a deterrent can be created by deliberately adding ML shortcuts. Such augmented datasets are then unusable for ML use cases, which deters crawlers and the unauthorized use of data from the internet. Using real-world data from three use cases, we show that the proposed approach renders such collected data unusable, while the shortcut is at the same time difficult to notice in human perception. Thus, our proposed approach can serve as a proactive protection against illegitimate data crawling.
This record has no associated files available for download.
More information
Published date: 20 November 2023
Venue - Dates:
British Machine Vision Conference 2023, , Aberdeen, United Kingdom, 2023-11-20 - 2023-11-24
Identifiers
Local EPrints ID: 502058
URI: http://eprints.soton.ac.uk/id/eprint/502058
PURE UUID: 5650fd3e-2426-4e64-881f-18aede5fa257
Catalogue record
Date deposited: 16 Jun 2025 16:32
Last modified: 17 Jun 2025 02:05
Export record
Contributors
Author:
Nicolas M. Müller
Author:
Maximilian Burgert
Author:
Pascal Debus
Author:
Jennifer Williams
Author:
Philip Sperl
Author:
Konstantin Böttinger
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics