The University of Southampton
University of Southampton Institutional Repository

Model monitoring in the absence of labeled data via feature attributions distributions

Model monitoring in the absence of labeled data via feature attributions distributions
Model monitoring in the absence of labeled data via feature attributions distributions
Model monitoring involves analyzing AI algorithms once they have been deployed and detecting changes in their behaviour.
This thesis explores machine learning model monitoring ML before the predictions impact real-world decisions or users. This step is characterized by one particular condition: the absence of labelled data at test time, which makes it challenging, even often impossible, to calculate performance metrics.

The thesis is structured around two main themes: \emph{(i) AI alignment}, measuring if AI models behave in a manner consistent with human values and \emph{(ii) performance monitoring}, measuring if the models achieve specific accuracy goals or desires.

The thesis uses a common methodology that unifies all its sections. It explores feature attribution distributions for both monitoring dimensions. Using these feature attribution explanations, we can exploit their theoretical properties to derive and establish certain guarantees and insights into model monitoring.

For AI Alignment, we explore whether the distributions of feature attributions are distinct for different social groups and propose a new formalization of equal treatment. This novel metric assesses how well AI decisions adhere to ethical standards and political-philosophical values. Our notion of Equal Treatment tests for statistical independence of the explanation distributions over populations with different protected characteristics. We show the theoretical properties of our formalization of equal treatment and devise an equal treatment inspector based on the AUC of a classifier two-sample test.

For performance monitoring, we define \emph{explanation shift} as the statistical comparison between how predictions from training data are explained and how predictions on new data are explained. We propose explanation shift as a key indicator to investigate the interaction between distribution shifts and learned models. We introduce an Explanation Shift Detector that operates on the explanation distributions, providing more sensitive and explainable changes in interactions between distribution shifts and learned models. We compare explanation shifts with other methods that are based on distribution shifts, showing that monitoring for explanation shifts results in more sensitive indicators for varying model behavior. We provide theoretical and experimental evidence and demonstrate the effectiveness of our approach on synthetic and real data.

Finally, to explain model degradation we use a second model, that predicts the uncertainty estimates of the first

Additionally, we release two open-source Python packages, \texttt{skshift} and \texttt{explanationspace}, which implement our methods and provide usage tutorials for further reproducibility.
University of Southampton
Mougan, Carlos
fdfb61c6-eb26-4f8d-87bf-958a2f2234d0
Mougan, Carlos
fdfb61c6-eb26-4f8d-87bf-958a2f2234d0
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Tiropanis, Thanassis
d06654bd-5513-407b-9acd-6f9b9c5009d8

Mougan, Carlos (2025) Model monitoring in the absence of labeled data via feature attributions distributions. University of Southampton, Doctoral Thesis, 155pp.

Record type: Thesis (Doctoral)

Abstract

Model monitoring involves analyzing AI algorithms once they have been deployed and detecting changes in their behaviour.
This thesis explores machine learning model monitoring ML before the predictions impact real-world decisions or users. This step is characterized by one particular condition: the absence of labelled data at test time, which makes it challenging, even often impossible, to calculate performance metrics.

The thesis is structured around two main themes: \emph{(i) AI alignment}, measuring if AI models behave in a manner consistent with human values and \emph{(ii) performance monitoring}, measuring if the models achieve specific accuracy goals or desires.

The thesis uses a common methodology that unifies all its sections. It explores feature attribution distributions for both monitoring dimensions. Using these feature attribution explanations, we can exploit their theoretical properties to derive and establish certain guarantees and insights into model monitoring.

For AI Alignment, we explore whether the distributions of feature attributions are distinct for different social groups and propose a new formalization of equal treatment. This novel metric assesses how well AI decisions adhere to ethical standards and political-philosophical values. Our notion of Equal Treatment tests for statistical independence of the explanation distributions over populations with different protected characteristics. We show the theoretical properties of our formalization of equal treatment and devise an equal treatment inspector based on the AUC of a classifier two-sample test.

For performance monitoring, we define \emph{explanation shift} as the statistical comparison between how predictions from training data are explained and how predictions on new data are explained. We propose explanation shift as a key indicator to investigate the interaction between distribution shifts and learned models. We introduce an Explanation Shift Detector that operates on the explanation distributions, providing more sensitive and explainable changes in interactions between distribution shifts and learned models. We compare explanation shifts with other methods that are based on distribution shifts, showing that monitoring for explanation shifts results in more sensitive indicators for varying model behavior. We provide theoretical and experimental evidence and demonstrate the effectiveness of our approach on synthetic and real data.

Finally, to explain model degradation we use a second model, that predicts the uncertainty estimates of the first

Additionally, we release two open-source Python packages, \texttt{skshift} and \texttt{explanationspace}, which implement our methods and provide usage tutorials for further reproducibility.

Text
PhD_Carlos_Mougan (1) - Version of Record
Available under License University of Southampton Thesis Licence.
Download (3MB)
Text
Final-thesis-submission-Examination-Mr-Carlos-Mougan (2)
Restricted to Repository staff only

More information

Published date: February 2025

Identifiers

Local EPrints ID: 498652
URI: http://eprints.soton.ac.uk/id/eprint/498652
PURE UUID: 3729e041-bcc0-4512-a763-ce34c491e0af
ORCID for Steffen Staab: ORCID iD orcid.org/0000-0002-0780-4154
ORCID for Thanassis Tiropanis: ORCID iD orcid.org/0000-0002-6195-2852

Catalogue record

Date deposited: 25 Feb 2025 17:31
Last modified: 22 Aug 2025 02:13

Export record

Contributors

Author: Carlos Mougan
Thesis advisor: Steffen Staab ORCID iD
Thesis advisor: Thanassis Tiropanis ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×