Model monitoring in the absence of labeled data via feature attributions distributions
Model monitoring in the absence of labeled data via feature attributions distributions
Model monitoring involves analyzing AI algorithms once they have been deployed and detecting changes in their behaviour.
This thesis explores machine learning model monitoring ML before the predictions impact real-world decisions or users. This step is characterized by one particular condition: the absence of labelled data at test time, which makes it challenging, even often impossible, to calculate performance metrics.
The thesis is structured around two main themes: \emph{(i) AI alignment}, measuring if AI models behave in a manner consistent with human values and \emph{(ii) performance monitoring}, measuring if the models achieve specific accuracy goals or desires.
The thesis uses a common methodology that unifies all its sections. It explores feature attribution distributions for both monitoring dimensions. Using these feature attribution explanations, we can exploit their theoretical properties to derive and establish certain guarantees and insights into model monitoring.
For AI Alignment, we explore whether the distributions of feature attributions are distinct for different social groups and propose a new formalization of equal treatment. This novel metric assesses how well AI decisions adhere to ethical standards and political-philosophical values. Our notion of Equal Treatment tests for statistical independence of the explanation distributions over populations with different protected characteristics. We show the theoretical properties of our formalization of equal treatment and devise an equal treatment inspector based on the AUC of a classifier two-sample test.
For performance monitoring, we define \emph{explanation shift} as the statistical comparison between how predictions from training data are explained and how predictions on new data are explained. We propose explanation shift as a key indicator to investigate the interaction between distribution shifts and learned models. We introduce an Explanation Shift Detector that operates on the explanation distributions, providing more sensitive and explainable changes in interactions between distribution shifts and learned models. We compare explanation shifts with other methods that are based on distribution shifts, showing that monitoring for explanation shifts results in more sensitive indicators for varying model behavior. We provide theoretical and experimental evidence and demonstrate the effectiveness of our approach on synthetic and real data.
Finally, to explain model degradation we use a second model, that predicts the uncertainty estimates of the first
Additionally, we release two open-source Python packages, \texttt{skshift} and \texttt{explanationspace}, which implement our methods and provide usage tutorials for further reproducibility.
University of Southampton
Mougan, Carlos
fdfb61c6-eb26-4f8d-87bf-958a2f2234d0
February 2025
Mougan, Carlos
fdfb61c6-eb26-4f8d-87bf-958a2f2234d0
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Tiropanis, Thanassis
d06654bd-5513-407b-9acd-6f9b9c5009d8
Mougan, Carlos
(2025)
Model monitoring in the absence of labeled data via feature attributions distributions.
University of Southampton, Doctoral Thesis, 155pp.
Record type:
Thesis
(Doctoral)
Abstract
Model monitoring involves analyzing AI algorithms once they have been deployed and detecting changes in their behaviour.
This thesis explores machine learning model monitoring ML before the predictions impact real-world decisions or users. This step is characterized by one particular condition: the absence of labelled data at test time, which makes it challenging, even often impossible, to calculate performance metrics.
The thesis is structured around two main themes: \emph{(i) AI alignment}, measuring if AI models behave in a manner consistent with human values and \emph{(ii) performance monitoring}, measuring if the models achieve specific accuracy goals or desires.
The thesis uses a common methodology that unifies all its sections. It explores feature attribution distributions for both monitoring dimensions. Using these feature attribution explanations, we can exploit their theoretical properties to derive and establish certain guarantees and insights into model monitoring.
For AI Alignment, we explore whether the distributions of feature attributions are distinct for different social groups and propose a new formalization of equal treatment. This novel metric assesses how well AI decisions adhere to ethical standards and political-philosophical values. Our notion of Equal Treatment tests for statistical independence of the explanation distributions over populations with different protected characteristics. We show the theoretical properties of our formalization of equal treatment and devise an equal treatment inspector based on the AUC of a classifier two-sample test.
For performance monitoring, we define \emph{explanation shift} as the statistical comparison between how predictions from training data are explained and how predictions on new data are explained. We propose explanation shift as a key indicator to investigate the interaction between distribution shifts and learned models. We introduce an Explanation Shift Detector that operates on the explanation distributions, providing more sensitive and explainable changes in interactions between distribution shifts and learned models. We compare explanation shifts with other methods that are based on distribution shifts, showing that monitoring for explanation shifts results in more sensitive indicators for varying model behavior. We provide theoretical and experimental evidence and demonstrate the effectiveness of our approach on synthetic and real data.
Finally, to explain model degradation we use a second model, that predicts the uncertainty estimates of the first
Additionally, we release two open-source Python packages, \texttt{skshift} and \texttt{explanationspace}, which implement our methods and provide usage tutorials for further reproducibility.
Text
PhD_Carlos_Mougan (1)
- Version of Record
Text
Final-thesis-submission-Examination-Mr-Carlos-Mougan (2)
Restricted to Repository staff only
More information
Published date: February 2025
Identifiers
Local EPrints ID: 498652
URI: http://eprints.soton.ac.uk/id/eprint/498652
PURE UUID: 3729e041-bcc0-4512-a763-ce34c491e0af
Catalogue record
Date deposited: 25 Feb 2025 17:31
Last modified: 22 Aug 2025 02:13
Export record
Contributors
Author:
Carlos Mougan
Thesis advisor:
Steffen Staab
Thesis advisor:
Thanassis Tiropanis
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics