Data mining approaches for network intrusion detection: from dimensionality reduction to misuse and anomaly detection
Data mining approaches for network intrusion detection: from dimensionality reduction to misuse and anomaly detection
This paper describes the use of data mining techniques to solve three important issues in network intrusion detection problems. The first goal is finding the best dimensionality reduction algorithm which reduces the computational cost while still maintains the accuracy. We implement both feature extraction (Principal Component Analysis and Independent Component Analysis) and feature selection (Genetic Algorithm and Particle Swarm Optimization) techniques for dimensionality reduction. The second goal is finding the best algorithm for misuse detection system to detect known intrusion. We implement four basic machine learning algorithms (Naïve Bayes, Decision Tree, Nearest Neighbour and Rule Induction) and then apply ensemble algorithms such as bagging, boosting and stacking to improve the performance of these four basic algorithms. The third goal is finding the best clustering algorithms to detect network anomalies which contains unknown intrusion. We analyze and compare the performance of four unsupervised clustering algorithms (k-Means, k-Medoids, EM clustering and distance-based outlier detection) in terms of accuracy and false positives.
Our experiment shows that the Nearest Neighbour (NN) classifier when implemented with Particle Swarm Optimization (PSO) as an attribute selection algorithm, achieved the best performance, which is 99.71% accuracy and 0.27% false positive. The misuse detection technique achieves a very good performance with more than 99% accuracy when detecting known intrusion but it fails to accurately detect data set with a large number of unknown intrusions where the highest accuracy is only 63.97%. In contrast, the anomaly detection approach shows promising results where the distance-based outlier detection method outperforms the other three clustering algorithms with the accuracy of 80.15%, followed by EM clustering (78.06%), k-Medoids (76.71%), improved k-Means (65.40%) and k-Means (57.81%).
intrusion detection system, anomaly detection, misuse detection, feature selection, clustering, ensemble classifiers
70-83
Syarif, Iwan
d6c3eb92-73cf-463b-819c-d97d017e54b5
Prugel-Bennett, Adam
b107a151-1751-4d8b-b8db-2c395ac4e14e
Wills, Gary
3a594558-6921-4e82-8098-38cd8d4e8aa0
May 2012
Syarif, Iwan
d6c3eb92-73cf-463b-819c-d97d017e54b5
Prugel-Bennett, Adam
b107a151-1751-4d8b-b8db-2c395ac4e14e
Wills, Gary
3a594558-6921-4e82-8098-38cd8d4e8aa0
Syarif, Iwan, Prugel-Bennett, Adam and Wills, Gary
(2012)
Data mining approaches for network intrusion detection: from dimensionality reduction to misuse and anomaly detection.
Journal of Information Technology Review, 3 (2), .
Abstract
This paper describes the use of data mining techniques to solve three important issues in network intrusion detection problems. The first goal is finding the best dimensionality reduction algorithm which reduces the computational cost while still maintains the accuracy. We implement both feature extraction (Principal Component Analysis and Independent Component Analysis) and feature selection (Genetic Algorithm and Particle Swarm Optimization) techniques for dimensionality reduction. The second goal is finding the best algorithm for misuse detection system to detect known intrusion. We implement four basic machine learning algorithms (Naïve Bayes, Decision Tree, Nearest Neighbour and Rule Induction) and then apply ensemble algorithms such as bagging, boosting and stacking to improve the performance of these four basic algorithms. The third goal is finding the best clustering algorithms to detect network anomalies which contains unknown intrusion. We analyze and compare the performance of four unsupervised clustering algorithms (k-Means, k-Medoids, EM clustering and distance-based outlier detection) in terms of accuracy and false positives.
Our experiment shows that the Nearest Neighbour (NN) classifier when implemented with Particle Swarm Optimization (PSO) as an attribute selection algorithm, achieved the best performance, which is 99.71% accuracy and 0.27% false positive. The misuse detection technique achieves a very good performance with more than 99% accuracy when detecting known intrusion but it fails to accurately detect data set with a large number of unknown intrusions where the highest accuracy is only 63.97%. In contrast, the anomaly detection approach shows promising results where the distance-based outlier detection method outperforms the other three clustering algorithms with the accuracy of 80.15%, followed by EM clustering (78.06%), k-Medoids (76.71%), improved k-Means (65.40%) and k-Means (57.81%).
Text
Data_mining_approaches_for_network_intrusion_detection_Journal_-_abstact.pdf
- Other
Text
Iwan - JITR.pdf
- Other
More information
Published date: May 2012
Keywords:
intrusion detection system, anomaly detection, misuse detection, feature selection, clustering, ensemble classifiers
Organisations:
Electronics & Computer Science
Identifiers
Local EPrints ID: 342811
URI: http://eprints.soton.ac.uk/id/eprint/342811
ISSN: 0976-3511
PURE UUID: 1ec16ab7-5a52-4cc0-8f4c-eeb9657d939b
Catalogue record
Date deposited: 14 Sep 2012 07:44
Last modified: 15 Mar 2024 02:51
Export record
Contributors
Author:
Iwan Syarif
Author:
Adam Prugel-Bennett
Author:
Gary Wills
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics