The University of Southampton
University of Southampton Institutional Repository

Machine Learning for Intrusion Detection: Modeling the Distribution Shift

Machine Learning for Intrusion Detection: Modeling the Distribution Shift
Machine Learning for Intrusion Detection: Modeling the Distribution Shift
This paper addresses two important issue that arise in formulating and solving computer intrusion detection as a machine learning problem, a topic that has attracted considerable attention in recent years including a community wide competition using a common data set known as the KDD Cup ’99. The first of these problems we address is the size of the data set, 5 × 106 by 41 features, which makes conventional learning algorithms impractical. In previous work, we introduced a one-pass non-parametric classification technique called Voted Spheres, which carves up the input space into a series of overlapping hyperspheres. Training data seen within each hypersphere is used in a voting scheme during testing on unseen data. Secondly, we address the problem of distribution shift whereby the training and test data may be drawn from slightly different probability densities, while the conditional densities of class membership for a given datum remains the same. We adopt two recent techniques from the literature, density weighting and kernel mean matching, to enhance the Voted Spheres technique to deal with such distribution disparities. We demonstrate that substantial performance gains can be achieved using these techniques on the KDD cup data set.
Farran, Bassam
9cee7a24-bb9b-410f-a07d-3d9422f3442d
Saunders, Craig
26634635-4d4d-4469-b9ec-1d68788aa47a
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Farran, Bassam
9cee7a24-bb9b-410f-a07d-3d9422f3442d
Saunders, Craig
26634635-4d4d-4469-b9ec-1d68788aa47a
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Farran, Bassam, Saunders, Craig and Niranjan, Mahesan (2010) Machine Learning for Intrusion Detection: Modeling the Distribution Shift. IEEE Workshop on Machine Learning for Signal Processing, Kittilä, Finland. 29 Aug - 01 Sep 2010.

Record type: Conference or Workshop Item (Paper)

Abstract

This paper addresses two important issue that arise in formulating and solving computer intrusion detection as a machine learning problem, a topic that has attracted considerable attention in recent years including a community wide competition using a common data set known as the KDD Cup ’99. The first of these problems we address is the size of the data set, 5 × 106 by 41 features, which makes conventional learning algorithms impractical. In previous work, we introduced a one-pass non-parametric classification technique called Voted Spheres, which carves up the input space into a series of overlapping hyperspheres. Training data seen within each hypersphere is used in a voting scheme during testing on unseen data. Secondly, we address the problem of distribution shift whereby the training and test data may be drawn from slightly different probability densities, while the conditional densities of class membership for a given datum remains the same. We adopt two recent techniques from the literature, density weighting and kernel mean matching, to enhance the Voted Spheres technique to deal with such distribution disparities. We demonstrate that substantial performance gains can be achieved using these techniques on the KDD cup data set.

Text
FarranMLSP2010.pdf - Other
Download (224kB)

More information

Published date: August 2010
Additional Information: Event Dates: August 29 - September 1, 2010
Venue - Dates: IEEE Workshop on Machine Learning for Signal Processing, Kittilä, Finland, 2010-08-29 - 2010-09-01
Organisations: Southampton Wireless Group

Identifiers

Local EPrints ID: 272869
URI: http://eprints.soton.ac.uk/id/eprint/272869
PURE UUID: bd89b782-ae79-4fe5-9cc9-41976416b670
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 28 Sep 2011 09:18
Last modified: 15 Mar 2024 03:29

Export record

Contributors

Author: Bassam Farran
Author: Craig Saunders
Author: Mahesan Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×