The University of Southampton
University of Southampton Institutional Repository

Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes

Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes
Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes
In a recidivism prediction context, there is no consensus on which modeling strategy should be followed for obtaining an optimal prediction model. In previous papers, a range of statistical and machine learning techniques were benchmarked on recidivism data with a binary outcome. However, two important tree ensemble methods, namely gradient boosting and random forests were not extensively evaluated. In this paper, we further explore the modeling potential of these techniques in the binary outcome criminal prediction context. Additionally, we explore the predictive potential of classical statistical and machine learning methods for censored time-to-event data. A range of statistical manually specified statistical and (semi-)automatic machine learning models is fitted on Dutch recidivism data, both for the binary outcome case and censored outcome case. To enhance generalizability of results, the same models are applied to two historical American data sets, the North Carolina prison
data. For all datasets, (semi-) automatic modeling in the binary case seems to provide no improvement over an appropriately manually specified traditional statistical model. There is however evidence of slightly improved performance of gradient boosting in survival data. Results on the reconviction data from two sources suggest that both statistical and machine learning should be tried out for obtaining an optimal model. Even if a flexible blackbox model does not improve upon the predictions of a manually specified model, it can serve as a test whether important interactions are missing or other misspecification of the model are present and can thus provide more security in the modeling process.
1932-6203
1-37
Tollenaar, Nikolaj
269ef27a-b6e8-48b5-80b1-c29989408771
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612
Tollenaar, Nikolaj
269ef27a-b6e8-48b5-80b1-c29989408771
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612

Tollenaar, Nikolaj and Van Der Heijden, Peter (2019) Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes. PLoS ONE, 14 (3), 1-37, [e0213245]. (doi:10.1371/journal.pone.0213245).

Record type: Article

Abstract

In a recidivism prediction context, there is no consensus on which modeling strategy should be followed for obtaining an optimal prediction model. In previous papers, a range of statistical and machine learning techniques were benchmarked on recidivism data with a binary outcome. However, two important tree ensemble methods, namely gradient boosting and random forests were not extensively evaluated. In this paper, we further explore the modeling potential of these techniques in the binary outcome criminal prediction context. Additionally, we explore the predictive potential of classical statistical and machine learning methods for censored time-to-event data. A range of statistical manually specified statistical and (semi-)automatic machine learning models is fitted on Dutch recidivism data, both for the binary outcome case and censored outcome case. To enhance generalizability of results, the same models are applied to two historical American data sets, the North Carolina prison
data. For all datasets, (semi-) automatic modeling in the binary case seems to provide no improvement over an appropriately manually specified traditional statistical model. There is however evidence of slightly improved performance of gradient boosting in survival data. Results on the reconviction data from two sources suggest that both statistical and machine learning should be tried out for obtaining an optimal model. Even if a flexible blackbox model does not improve upon the predictions of a manually specified model, it can serve as a test whether important interactions are missing or other misspecification of the model are present and can thus provide more security in the modeling process.

Text
Tollenaar and van der Heijden 2019 journal.pone.0213245 - Version of Record
Available under License Creative Commons Attribution.
Download (8MB)
Text
Tollenaar and van der Heijden 2019 journal.pone.0213245 (002) - Version of Record
Restricted to Repository staff only
Request a copy

More information

Accepted/In Press date: 19 February 2019
Published date: 8 March 2019

Identifiers

Local EPrints ID: 429063
URI: http://eprints.soton.ac.uk/id/eprint/429063
ISSN: 1932-6203
PURE UUID: 7461f63e-5eec-44d2-bfe6-9cfe27c5f407
ORCID for Peter Van Der Heijden: ORCID iD orcid.org/0000-0002-3345-096X

Catalogue record

Date deposited: 20 Mar 2019 17:30
Last modified: 16 Mar 2024 04:14

Export record

Altmetrics

Contributors

Author: Nikolaj Tollenaar

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×