A comparison of state-of-the-art classification techniques for expert automobile insurance fraud detection

Several state–of–the–art binary classification techniques are experimentally evaluated in the context of expert automobile insurance claim fraud detection. The predictive power of logistic regression, C4.5 decision tree, k–nearest neighbor, Bayesian learning multilayer perceptron neural network, least–squares support vector machine, naive Bayes, and tree–augmented naive Bayes classification is contrasted. For most of these algorithm types, we report on several operationalizations using alternative hyperparameter or design choices. We compare these in terms of mean percentage correctly classified (PCC) and mean area under the receiver operating characteristic (AUROC) curve using a stratified, blocked, ten–fold cross–validation experiment.
We also contrast algorithm type performance visually by means of the convex hull of the receiver operating characteristic (ROC) curves associated with the alternative operationalizations per algorithm type. The study is based on a data set of 1,399 personal injury protection claims from 1993 accidents collected by the Automobile Insurers Bureau of Massachusetts. To stay as close to real–life operating conditions as possible, we consider only predictors that are known relatively early in the life of a claim. Furthermore, based on the qualification of each available claim by both a verbal expert assessment of suspicion of fraud and a ten–point–scale expert suspicion score, we can compare classification for different target/class encoding schemes.
Finally, we also investigate the added value of systematically collecting nonflag predictors for suspicion of fraud modeling purposes. From the observed results, we may state that: (1) independent of the target encoding scheme and the algorithm type, the inclusion of nonflag predictors allows us to significantly boost predictive performance; (2) for all the evaluated scenarios, the performance difference in terms of mean PCC and mean AUROC between many algorithm type operationalizations turns out to be rather small; visual comparison of the algorithm type ROC curve convex hulls also shows limited difference in performance over the range of operating conditions; (3) relatively simple and efficient techniques such as linear logistic regression and linear kernel least–squares support vector machine classification show excellent overall predictive capabilities, and (smoothed) naive Bayes also performs well; and (4) the C4.5 decision tree operationalization results are rather disappointing; none of the tree operationalizations are capable of attaining mean AUROC performance in line with the best. Visual inspection of the evaluated scenarios reveals that the C4.5 algorithm type ROC curve convex hull is often dominated in large part by most of the other algorithm type hulls.

0022-4367

373-421

http://www.ingentaconnect.com/content/bpl/jori/200...

Viaene, Stijn

e4f8934b-ddb8-44da-b381-fd54bf99e274

Derrig, Richard A.

e27657e1-69b6-44e2-9d30-e4afee5b0b3b

Baesens, Bart

f7c6496b-aa7f-4026-8616-ca61d9e216f0

Dedene, Guido

de15fcda-ec48-47e2-bf1e-e882ab48061c

2002

Viaene, Stijn

e4f8934b-ddb8-44da-b381-fd54bf99e274

Derrig, Richard A.

e27657e1-69b6-44e2-9d30-e4afee5b0b3b

Baesens, Bart

f7c6496b-aa7f-4026-8616-ca61d9e216f0

Dedene, Guido

de15fcda-ec48-47e2-bf1e-e882ab48061c

Viaene, Stijn, Derrig, Richard A., Baesens, Bart and Dedene, Guido (2002) A comparison of state-of-the-art classification techniques for expert automobile insurance fraud detection. Journal of Risk and Insurance, 69 (3), 373-421.

Record type: Article

Abstract

This record has no associated files available for download.

More information

Published date: 2002

Organisations: Management

Learn more about Management research

Identifiers

Local EPrints ID: 36738

URI: http://eprints.soton.ac.uk/id/eprint/36738

ISSN: 0022-4367

PURE UUID: e5c15776-8ef5-4fe8-b6e8-c03ddb98b0f9

ORCID for Bart Baesens:

orcid.org/0000-0002-5831-5668

Catalogue record

Date deposited: 25 May 2006

Last modified: 28 Apr 2022 01:53

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Stijn Viaene

Author: Richard A. Derrig

Author: Bart Baesens

Author: Guido Dedene

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information