Towards accurate duplicate bug retrieval using deep learning techniques
Towards accurate duplicate bug retrieval using deep learning techniques
Duplicate Bug Detection is the problem of identifying whether a newly reported bug is a duplicate of an existing bug in the system and retrieving the original or similar bugs from the past. This is required to avoid costly rediscovery and redundant work. In typical software projects, the number of duplicate bugs reported may run into the order of thousands, making it expensive in terms of cost and time for manual intervention. This makes the problem of duplicate or similar bug detection an important one in Software Engineering domain. However, an automated solution for the same is not quite accurate yet in practice, in spite of many reported approaches using various machine learning techniques. In this work, we propose a retrieval and classification model using Siamese Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) for accurate detection and retrieval of duplicate and similar bugs. We report an accuracy close to 90% and recall rate close to 80%, which makes possible the practical use of such a system. We describe our model in detail along with related discussions from the Deep Learning domain. By presenting the detailed experimental results, we illustrate the effectiveness of the model in practical systems, including for repositories for which supervised training data is not available.
Convolutional neural networks, Deep learning, Duplicate bug detection, Information retrieval, Long short term memory, Natural language processing, Siamese networks, Word embeddings
115-124
Deshmukh, Jayati
5903b0c1-b4d1-4fbf-b687-610d4fde3990
Annervaz, K. M.
60ecdbb0-0673-49ca-92d4-29e48a46a0bb
Podder, Sanjay
40299378-a769-4d73-825a-be9671c5380b
Sengupta, Shubhashis
b7c8401f-33ff-4edc-89cf-228aa902a6cc
Dubash, Neville
1c2e8db8-60af-4878-9600-4b448e5daa7e
2 November 2017
Deshmukh, Jayati
5903b0c1-b4d1-4fbf-b687-610d4fde3990
Annervaz, K. M.
60ecdbb0-0673-49ca-92d4-29e48a46a0bb
Podder, Sanjay
40299378-a769-4d73-825a-be9671c5380b
Sengupta, Shubhashis
b7c8401f-33ff-4edc-89cf-228aa902a6cc
Dubash, Neville
1c2e8db8-60af-4878-9600-4b448e5daa7e
Deshmukh, Jayati, Annervaz, K. M., Podder, Sanjay, Sengupta, Shubhashis and Dubash, Neville
(2017)
Towards accurate duplicate bug retrieval using deep learning techniques.
In Proceedings - 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017.
IEEE.
.
(doi:10.1109/ICSME.2017.69).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Duplicate Bug Detection is the problem of identifying whether a newly reported bug is a duplicate of an existing bug in the system and retrieving the original or similar bugs from the past. This is required to avoid costly rediscovery and redundant work. In typical software projects, the number of duplicate bugs reported may run into the order of thousands, making it expensive in terms of cost and time for manual intervention. This makes the problem of duplicate or similar bug detection an important one in Software Engineering domain. However, an automated solution for the same is not quite accurate yet in practice, in spite of many reported approaches using various machine learning techniques. In this work, we propose a retrieval and classification model using Siamese Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) for accurate detection and retrieval of duplicate and similar bugs. We report an accuracy close to 90% and recall rate close to 80%, which makes possible the practical use of such a system. We describe our model in detail along with related discussions from the Deep Learning domain. By presenting the detailed experimental results, we illustrate the effectiveness of the model in practical systems, including for repositories for which supervised training data is not available.
This record has no associated files available for download.
More information
Published date: 2 November 2017
Additional Information:
Publisher Copyright:
© 2017 IEEE.
Venue - Dates:
2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017, , Shanghai, China, 2017-09-19 - 2017-09-22
Keywords:
Convolutional neural networks, Deep learning, Duplicate bug detection, Information retrieval, Long short term memory, Natural language processing, Siamese networks, Word embeddings
Identifiers
Local EPrints ID: 493372
URI: http://eprints.soton.ac.uk/id/eprint/493372
PURE UUID: d1072bfe-590f-4de9-aa16-e3ae455fba87
Catalogue record
Date deposited: 30 Aug 2024 17:09
Last modified: 31 Aug 2024 02:12
Export record
Altmetrics
Contributors
Author:
Jayati Deshmukh
Author:
K. M. Annervaz
Author:
Sanjay Podder
Author:
Shubhashis Sengupta
Author:
Neville Dubash
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics