Explanation shift: how did the distribution shift impact the model?

The performance of machine learning models on new data is critical for their success in real-world applications. Current methods to detect shifts in the input or output data distributions have limitations in identifying model behaviour changes when no labelled data is available. In this paper, we define \emph{explanation shift} as the statistical comparison between how predictions from training data are explained and how predictions on new data are explained. We propose explanation shift as a key indicator to investigate the interaction between distribution shifts and learned models. We introduce an Explanation Shift Detector that operates on the explanation distributions, providing more sensitive and explainable changes in interactions between distribution shifts and learned models. We compare explanation shifts with other methods that are based on distribution shifts, showing that monitoring for explanation shifts results in more sensitive indicators for varying model behavior. We provide theoretical and experimental evidence and demonstrate the effectiveness of our approach on synthetic and real data. Additionally, we release an open-source Python package, \texttt{skshift}, which implements our method and provides usage tutorials for further reproducibility.

Mougan, Carlos

229c7631-f1da-4896-a06a-fd27e77e5742

Broelemann, Klaus

591f0927-c503-465e-8b27-e615a564733c

Kasneci, Gjergji

2991f0cd-5693-4843-9da5-af09e489ce7d

Tiropanis, Thanassis

d06654bd-5513-407b-9acd-6f9b9c5009d8

Staab, Steffen

bf48d51b-bd11-4d58-8e1c-4e6e03b30c49

30 January 2025