The University of Southampton
University of Southampton Institutional Repository

On the efficiency of data collection for multiple Naïve Bayes classifiers

On the efficiency of data collection for multiple Naïve Bayes classifiers
On the efficiency of data collection for multiple Naïve Bayes classifiers
Many classification problems are solved by aggregating the output of a group of distinct predictors. In this respect, a popular choice is to assume independence and employ a Naïve Bayes classifier. When we have not just one but multiple classification problems at the same time, the question of how to assign the limited pool of available predictors to the individual classification problems arises. Empirical studies show that the policies we use to perform such assignments have a strong impact on the accuracy of the system. However, to date there is little theoretical understanding of this phenomenon. To help rectify this, in this paper we provide the first theoretical explanation of the accuracy gap between the most popular policies: the non-adaptive uniform allocation, and the adaptive allocation schemes based on uncertainty sampling and information gain maximisation. To do so, we propose a novel representation of the data collection process in terms of random walks. Then, we use this tool to derive new lower and upper bounds on the accuracy of the policies. These bounds reveal that the tradeoff between the number of available predictors and the accuracy has a different exponential rate depending on the policy used. By comparing them, we are able to quantify the advantage that the two adaptive policies have over the non-adaptive one for the first time, and prove that the probability of error of the former decays at more than double the exponential rate of the latter. Furthermore, we show in our analysis that this result holds both in the case where we know the accuracy of each individual predictor, and in the case where we only have access to a noisy estimate of it.
0004-3702
57-78
Manino, Edoardo
e5cec65c-c44b-45de-8255-7b1d8edfc04d
Tran-Thanh, Long
e0666669-d34b-460e-950d-e8b139fab16c
Jennings, Nicholas
acc04ad3-67e7-4fa1-92c2-448abcad4d68
Manino, Edoardo
e5cec65c-c44b-45de-8255-7b1d8edfc04d
Tran-Thanh, Long
e0666669-d34b-460e-950d-e8b139fab16c
Jennings, Nicholas
acc04ad3-67e7-4fa1-92c2-448abcad4d68

Manino, Edoardo, Tran-Thanh, Long and Jennings, Nicholas (2019) On the efficiency of data collection for multiple Naïve Bayes classifiers. Artificial Intelligence, 275, 57-78. (doi:10.1016/j.artint.2019.06.010).

Record type: Article

Abstract

Many classification problems are solved by aggregating the output of a group of distinct predictors. In this respect, a popular choice is to assume independence and employ a Naïve Bayes classifier. When we have not just one but multiple classification problems at the same time, the question of how to assign the limited pool of available predictors to the individual classification problems arises. Empirical studies show that the policies we use to perform such assignments have a strong impact on the accuracy of the system. However, to date there is little theoretical understanding of this phenomenon. To help rectify this, in this paper we provide the first theoretical explanation of the accuracy gap between the most popular policies: the non-adaptive uniform allocation, and the adaptive allocation schemes based on uncertainty sampling and information gain maximisation. To do so, we propose a novel representation of the data collection process in terms of random walks. Then, we use this tool to derive new lower and upper bounds on the accuracy of the policies. These bounds reveal that the tradeoff between the number of available predictors and the accuracy has a different exponential rate depending on the policy used. By comparing them, we are able to quantify the advantage that the two adaptive policies have over the non-adaptive one for the first time, and prove that the probability of error of the former decays at more than double the exponential rate of the latter. Furthermore, we show in our analysis that this result holds both in the case where we know the accuracy of each individual predictor, and in the case where we only have access to a noisy estimate of it.

Text
On the efficiency of data collection for multiple Naïve Bayes classifiers - Accepted Manuscript
Download (479kB)

More information

Accepted/In Press date: 30 June 2019
e-pub ahead of print date: 2 July 2019
Published date: November 2019

Identifiers

Local EPrints ID: 432936
URI: https://eprints.soton.ac.uk/id/eprint/432936
ISSN: 0004-3702
PURE UUID: 0f2fd0fe-1c65-4c15-b3b2-9517f0a05bc4
ORCID for Edoardo Manino: ORCID iD orcid.org/0000-0003-0028-5440
ORCID for Long Tran-Thanh: ORCID iD orcid.org/0000-0003-1617-8316

Catalogue record

Date deposited: 01 Aug 2019 16:30
Last modified: 02 Nov 2019 01:34

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×