The University of Southampton
University of Southampton Institutional Repository

Data mining techniques for software effort estimation: a comparative study

Data mining techniques for software effort estimation: a comparative study
Data mining techniques for software effort estimation: a comparative study
A predictive model is required to be accurate and comprehensible in order to inspire confidence in a business setting. Both aspects have been assessed in a software effort estimation setting by previous studies. However, no univocal conclusion as to which technique is the most suited has been reached. This study addresses this issue by reporting on the results of a large scale benchmarking study. Different types of techniques are under consideration, including techniques inducing tree/rule based models like M5 and CART, linear models such as various types of linear regression, nonlinear models (MARS, multilayered perceptron neural networks, radial basis function networks, and least squares support vector machines), and estimation techniques that do not explicitly induce a model (e.g., a case-based reasoning approach). Furthermore, the aspect of feature subset selection by using a generic backward input selection wrapper is investigated. The results are subjected to rigorous statistical testing and indicate that ordinary least squares regression in combination with a logarithmic transformation performs best. Another key finding is that by selecting a subset of highly predictive attributes such as project size, development, and environment related attributes, typically a significant increase in estimation accuracy can be obtained
data mining, software effort estimation, regression
375-397
Dejaeger, Karel
491728b5-118c-4661-b003-3787037c8f2d
Verbeke, Wouter
57c0d98a-130a-4202-b6dd-cdc6914f4732
Martens, David
42e7e141-fb3d-4ead-8e3a-96b39bab65f9
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0
Dejaeger, Karel
491728b5-118c-4661-b003-3787037c8f2d
Verbeke, Wouter
57c0d98a-130a-4202-b6dd-cdc6914f4732
Martens, David
42e7e141-fb3d-4ead-8e3a-96b39bab65f9
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0

Dejaeger, Karel, Verbeke, Wouter, Martens, David and Baesens, Bart (2012) Data mining techniques for software effort estimation: a comparative study. IEEE Transactions on Software Engineering, 38 (2), 375-397. (doi:10.1109/TSE.2011.55).

Record type: Article

Abstract

A predictive model is required to be accurate and comprehensible in order to inspire confidence in a business setting. Both aspects have been assessed in a software effort estimation setting by previous studies. However, no univocal conclusion as to which technique is the most suited has been reached. This study addresses this issue by reporting on the results of a large scale benchmarking study. Different types of techniques are under consideration, including techniques inducing tree/rule based models like M5 and CART, linear models such as various types of linear regression, nonlinear models (MARS, multilayered perceptron neural networks, radial basis function networks, and least squares support vector machines), and estimation techniques that do not explicitly induce a model (e.g., a case-based reasoning approach). Furthermore, the aspect of feature subset selection by using a generic backward input selection wrapper is investigated. The results are subjected to rigorous statistical testing and indicate that ordinary least squares regression in combination with a logarithmic transformation performs best. Another key finding is that by selecting a subset of highly predictive attributes such as project size, development, and environment related attributes, typically a significant increase in estimation accuracy can be obtained

This record has no associated files available for download.

More information

Published date: 2012
Keywords: data mining, software effort estimation, regression
Organisations: Southampton Business School

Identifiers

Local EPrints ID: 336472
URI: http://eprints.soton.ac.uk/id/eprint/336472
PURE UUID: 2dd352e3-9a86-4054-95c6-aad94095a982
ORCID for Bart Baesens: ORCID iD orcid.org/0000-0002-5831-5668

Catalogue record

Date deposited: 27 Mar 2012 11:55
Last modified: 15 Mar 2024 03:20

Export record

Altmetrics

Contributors

Author: Karel Dejaeger
Author: Wouter Verbeke
Author: David Martens
Author: Bart Baesens ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×