The University of Southampton
University of Southampton Institutional Repository

Data mining approaches for network intrusion detection: from dimensionality reduction to misuse and anomaly detection

Data mining approaches for network intrusion detection: from dimensionality reduction to misuse and anomaly detection
Data mining approaches for network intrusion detection: from dimensionality reduction to misuse and anomaly detection
This paper describes the use of data mining techniques to solve three important issues in network intrusion detection problems. The first goal is finding the best dimensionality reduction algorithm which reduces the computational cost while still maintains the accuracy. We implement both feature extraction (Principal Component Analysis and Independent Component Analysis) and feature selection (Genetic Algorithm and Particle Swarm Optimization) techniques for dimensionality reduction. The second goal is finding the best algorithm for misuse detection system to detect known intrusion. We implement four basic machine learning algorithms (Naïve Bayes, Decision Tree, Nearest Neighbour and Rule Induction) and then apply ensemble algorithms such as bagging, boosting and stacking to improve the performance of these four basic algorithms. The third goal is finding the best clustering algorithms to detect network anomalies which contains unknown intrusion. We analyze and compare the performance of four unsupervised clustering algorithms (k-Means, k-Medoids, EM clustering and distance-based outlier detection) in terms of accuracy and false positives.

Our experiment shows that the Nearest Neighbour (NN) classifier when implemented with Particle Swarm Optimization (PSO) as an attribute selection algorithm, achieved the best performance, which is 99.71% accuracy and 0.27% false positive. The misuse detection technique achieves a very good performance with more than 99% accuracy when detecting known intrusion but it fails to accurately detect data set with a large number of unknown intrusions where the highest accuracy is only 63.97%. In contrast, the anomaly detection approach shows promising results where the distance-based outlier detection method outperforms the other three clustering algorithms with the accuracy of 80.15%, followed by EM clustering (78.06%), k-Medoids (76.71%), improved k-Means (65.40%) and k-Means (57.81%).
intrusion detection system, anomaly detection, misuse detection, feature selection, clustering, ensemble classifiers
0976-3511
70-83
Syarif, Iwan
d6c3eb92-73cf-463b-819c-d97d017e54b5
Prugel-Bennett, Adam
b107a151-1751-4d8b-b8db-2c395ac4e14e
Wills, Gary
3a594558-6921-4e82-8098-38cd8d4e8aa0
Syarif, Iwan
d6c3eb92-73cf-463b-819c-d97d017e54b5
Prugel-Bennett, Adam
b107a151-1751-4d8b-b8db-2c395ac4e14e
Wills, Gary
3a594558-6921-4e82-8098-38cd8d4e8aa0

Syarif, Iwan, Prugel-Bennett, Adam and Wills, Gary (2012) Data mining approaches for network intrusion detection: from dimensionality reduction to misuse and anomaly detection. Journal of Information Technology Review, 3 (2), 70-83.

Record type: Article

Abstract

This paper describes the use of data mining techniques to solve three important issues in network intrusion detection problems. The first goal is finding the best dimensionality reduction algorithm which reduces the computational cost while still maintains the accuracy. We implement both feature extraction (Principal Component Analysis and Independent Component Analysis) and feature selection (Genetic Algorithm and Particle Swarm Optimization) techniques for dimensionality reduction. The second goal is finding the best algorithm for misuse detection system to detect known intrusion. We implement four basic machine learning algorithms (Naïve Bayes, Decision Tree, Nearest Neighbour and Rule Induction) and then apply ensemble algorithms such as bagging, boosting and stacking to improve the performance of these four basic algorithms. The third goal is finding the best clustering algorithms to detect network anomalies which contains unknown intrusion. We analyze and compare the performance of four unsupervised clustering algorithms (k-Means, k-Medoids, EM clustering and distance-based outlier detection) in terms of accuracy and false positives.

Our experiment shows that the Nearest Neighbour (NN) classifier when implemented with Particle Swarm Optimization (PSO) as an attribute selection algorithm, achieved the best performance, which is 99.71% accuracy and 0.27% false positive. The misuse detection technique achieves a very good performance with more than 99% accuracy when detecting known intrusion but it fails to accurately detect data set with a large number of unknown intrusions where the highest accuracy is only 63.97%. In contrast, the anomaly detection approach shows promising results where the distance-based outlier detection method outperforms the other three clustering algorithms with the accuracy of 80.15%, followed by EM clustering (78.06%), k-Medoids (76.71%), improved k-Means (65.40%) and k-Means (57.81%).

Text
Data_mining_approaches_for_network_intrusion_detection_Journal_-_abstact.pdf - Other
Download (26kB)
Text
Iwan - JITR.pdf - Other
Download (193kB)

More information

Published date: May 2012
Keywords: intrusion detection system, anomaly detection, misuse detection, feature selection, clustering, ensemble classifiers
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 342811
URI: http://eprints.soton.ac.uk/id/eprint/342811
ISSN: 0976-3511
PURE UUID: 1ec16ab7-5a52-4cc0-8f4c-eeb9657d939b
ORCID for Gary Wills: ORCID iD orcid.org/0000-0001-5771-4088

Catalogue record

Date deposited: 14 Sep 2012 07:44
Last modified: 15 Mar 2024 02:51

Export record

Contributors

Author: Iwan Syarif
Author: Adam Prugel-Bennett
Author: Gary Wills ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×