The University of Southampton
University of Southampton Institutional Repository

Persistence-based summaries for data analysis with applications to cyber security

Persistence-based summaries for data analysis with applications to cyber security
Persistence-based summaries for data analysis with applications to cyber security
First formalised by Poincaré in his seminal text Analysis Situs, meaning the geometry of position, topology is the mathematical study of structure that remains invariant under continuous deformation. For over a hundred years this understanding of shape was confined to pure mathematics, but the advent of persistence-based summaries which enable practitioners to compute concise representations of the topology of data with strong theoretical guarantees has led to applications of the topological notion of shape to data analysis and machine learning. The first part of this thesis is concerned with understanding and extending the application of persistence-based summaries to machine learning. Motivated by an investigation into the utility of topological loss terms through the lens of statistical learning theory, we adapt a recent extension of the higher-order Laplacian to the persistent case for machine learning, suggesting a vectorisation scheme and baselining its efficacy on the MNIST and MoleculeNet datasets. We find that it outperforms persistent homology across all of our baseline tasks. We also extend the ubiquitous fuzzy c-means clustering algorithm to the space of persistence diagrams, proving the same convergence guarantees as the Euclidean case. We apply the fuzzy clustering algorithm to model selection, matching pre-trained deep learning models to datasets via the topology of their decision boundaries. In the second part of this thesis we consider applications of persistence-based summaries to cyber security. Cyber security is a critical application domain, with the annual cost of cyber crime to the UK economy estimated to be in excess of £27 billion and cyber attacks considered a tier 1 national security risk by the UK government. We investigate the utility of persistence-based summaries when detecting malicious behaviour in host-based computer logs, which are intrinsically extremely structured. We find that our methods can rival a standard baseline from the literature.
University of Southampton
Davies, Thomas
55626665-ec62-46e8-9140-11316e5c2576
Davies, Thomas
55626665-ec62-46e8-9140-11316e5c2576
Sanchez Garcia, Ruben
8246cea2-ae1c-44f2-94e9-bacc9371c3ed
Tran-Thanh, Long
aecacf50-460e-410a-83be-b0c2a5ae226e
Cirstea, Corina
ce5b1cf1-5329-444f-9a76-0abcc47a54ea

Davies, Thomas (2023) Persistence-based summaries for data analysis with applications to cyber security. University of Southampton, Doctoral Thesis, 164pp.

Record type: Thesis (Doctoral)

Abstract

First formalised by Poincaré in his seminal text Analysis Situs, meaning the geometry of position, topology is the mathematical study of structure that remains invariant under continuous deformation. For over a hundred years this understanding of shape was confined to pure mathematics, but the advent of persistence-based summaries which enable practitioners to compute concise representations of the topology of data with strong theoretical guarantees has led to applications of the topological notion of shape to data analysis and machine learning. The first part of this thesis is concerned with understanding and extending the application of persistence-based summaries to machine learning. Motivated by an investigation into the utility of topological loss terms through the lens of statistical learning theory, we adapt a recent extension of the higher-order Laplacian to the persistent case for machine learning, suggesting a vectorisation scheme and baselining its efficacy on the MNIST and MoleculeNet datasets. We find that it outperforms persistent homology across all of our baseline tasks. We also extend the ubiquitous fuzzy c-means clustering algorithm to the space of persistence diagrams, proving the same convergence guarantees as the Euclidean case. We apply the fuzzy clustering algorithm to model selection, matching pre-trained deep learning models to datasets via the topology of their decision boundaries. In the second part of this thesis we consider applications of persistence-based summaries to cyber security. Cyber security is a critical application domain, with the annual cost of cyber crime to the UK economy estimated to be in excess of £27 billion and cyber attacks considered a tier 1 national security risk by the UK government. We investigate the utility of persistence-based summaries when detecting malicious behaviour in host-based computer logs, which are intrinsically extremely structured. We find that our methods can rival a standard baseline from the literature.

Text
Thomas_Davies_Doctoral_thesis_pdfa - Version of Record
Available under License University of Southampton Thesis Licence.
Download (8MB)
Text
Final-thesis-submission-Examination-Mr-Thomas-Davies
Restricted to Repository staff only
Available under License University of Southampton Thesis Licence.

More information

Submitted date: April 2023
Published date: September 2023

Identifiers

Local EPrints ID: 481621
URI: http://eprints.soton.ac.uk/id/eprint/481621
PURE UUID: 1bd9009b-a0a7-43c4-8e80-fd6a6cff7e3a
ORCID for Ruben Sanchez Garcia: ORCID iD orcid.org/0000-0001-6479-3028
ORCID for Corina Cirstea: ORCID iD orcid.org/0000-0003-3165-5678

Catalogue record

Date deposited: 05 Sep 2023 16:35
Last modified: 06 Jun 2024 01:48

Export record

Contributors

Author: Thomas Davies
Thesis advisor: Ruben Sanchez Garcia ORCID iD
Thesis advisor: Long Tran-Thanh
Thesis advisor: Corina Cirstea ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×