The University of Southampton
University of Southampton Institutional Repository

Towards accurate duplicate bug retrieval using deep learning techniques

Towards accurate duplicate bug retrieval using deep learning techniques
Towards accurate duplicate bug retrieval using deep learning techniques

Duplicate Bug Detection is the problem of identifying whether a newly reported bug is a duplicate of an existing bug in the system and retrieving the original or similar bugs from the past. This is required to avoid costly rediscovery and redundant work. In typical software projects, the number of duplicate bugs reported may run into the order of thousands, making it expensive in terms of cost and time for manual intervention. This makes the problem of duplicate or similar bug detection an important one in Software Engineering domain. However, an automated solution for the same is not quite accurate yet in practice, in spite of many reported approaches using various machine learning techniques. In this work, we propose a retrieval and classification model using Siamese Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) for accurate detection and retrieval of duplicate and similar bugs. We report an accuracy close to 90% and recall rate close to 80%, which makes possible the practical use of such a system. We describe our model in detail along with related discussions from the Deep Learning domain. By presenting the detailed experimental results, we illustrate the effectiveness of the model in practical systems, including for repositories for which supervised training data is not available.

Convolutional neural networks, Deep learning, Duplicate bug detection, Information retrieval, Long short term memory, Natural language processing, Siamese networks, Word embeddings
115-124
IEEE
Deshmukh, Jayati
5903b0c1-b4d1-4fbf-b687-610d4fde3990
Annervaz, K. M.
60ecdbb0-0673-49ca-92d4-29e48a46a0bb
Podder, Sanjay
40299378-a769-4d73-825a-be9671c5380b
Sengupta, Shubhashis
b7c8401f-33ff-4edc-89cf-228aa902a6cc
Dubash, Neville
1c2e8db8-60af-4878-9600-4b448e5daa7e
Deshmukh, Jayati
5903b0c1-b4d1-4fbf-b687-610d4fde3990
Annervaz, K. M.
60ecdbb0-0673-49ca-92d4-29e48a46a0bb
Podder, Sanjay
40299378-a769-4d73-825a-be9671c5380b
Sengupta, Shubhashis
b7c8401f-33ff-4edc-89cf-228aa902a6cc
Dubash, Neville
1c2e8db8-60af-4878-9600-4b448e5daa7e

Deshmukh, Jayati, Annervaz, K. M., Podder, Sanjay, Sengupta, Shubhashis and Dubash, Neville (2017) Towards accurate duplicate bug retrieval using deep learning techniques. In Proceedings - 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017. IEEE. pp. 115-124 . (doi:10.1109/ICSME.2017.69).

Record type: Conference or Workshop Item (Paper)

Abstract

Duplicate Bug Detection is the problem of identifying whether a newly reported bug is a duplicate of an existing bug in the system and retrieving the original or similar bugs from the past. This is required to avoid costly rediscovery and redundant work. In typical software projects, the number of duplicate bugs reported may run into the order of thousands, making it expensive in terms of cost and time for manual intervention. This makes the problem of duplicate or similar bug detection an important one in Software Engineering domain. However, an automated solution for the same is not quite accurate yet in practice, in spite of many reported approaches using various machine learning techniques. In this work, we propose a retrieval and classification model using Siamese Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) for accurate detection and retrieval of duplicate and similar bugs. We report an accuracy close to 90% and recall rate close to 80%, which makes possible the practical use of such a system. We describe our model in detail along with related discussions from the Deep Learning domain. By presenting the detailed experimental results, we illustrate the effectiveness of the model in practical systems, including for repositories for which supervised training data is not available.

This record has no associated files available for download.

More information

Published date: 2 November 2017
Additional Information: Publisher Copyright: © 2017 IEEE.
Venue - Dates: 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017, , Shanghai, China, 2017-09-19 - 2017-09-22
Keywords: Convolutional neural networks, Deep learning, Duplicate bug detection, Information retrieval, Long short term memory, Natural language processing, Siamese networks, Word embeddings

Identifiers

Local EPrints ID: 493372
URI: http://eprints.soton.ac.uk/id/eprint/493372
PURE UUID: d1072bfe-590f-4de9-aa16-e3ae455fba87
ORCID for Jayati Deshmukh: ORCID iD orcid.org/0000-0002-1144-2635

Catalogue record

Date deposited: 30 Aug 2024 17:09
Last modified: 31 Aug 2024 02:12

Export record

Altmetrics

Contributors

Author: Jayati Deshmukh ORCID iD
Author: K. M. Annervaz
Author: Sanjay Podder
Author: Shubhashis Sengupta
Author: Neville Dubash

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×