The University of Southampton
University of Southampton Institutional Repository

On time series clustering with k-means

On time series clustering with k-means
On time series clustering with k-means
There is a long history of research into time series clustering using distance-based partitional clustering. Many of the most popular algorithms adapt k-means (also known as Lloyd's algorithm) to exploit time dependencies in the data by specifying a time series distance function. However, these algorithms are often presented with k-means configured in various ways, altering key parameters such as the initialisation strategy. This variability makes it difficult to compare studies because k-means is known to be highly sensitive to its configuration. To address this, we propose a standard Lloyd's-based model for TSCL that adopts an end-to-end approach, incorporating a specialised distance function not only in the assignment step but also in the initialisation and stopping criteria. By doing so, we create a unified structure for comparing seven popular Lloyd's-based TSCL algorithms. This common framework enables us to more easily attribute differences in clustering performance to the distance function itself, rather than variations in the k-means configuration.
cs.LG
arXiv
Holder, Christopher
fb345cc6-00fa-4256-80ba-a8d3cbdb768b
Bagnall, Anthony
d31e6506-2a00-4358-ba3f-baefd48d59d8
Lines, Jason
5d664e74-7313-445d-8099-cecb63157a2c
Holder, Christopher
fb345cc6-00fa-4256-80ba-a8d3cbdb768b
Bagnall, Anthony
d31e6506-2a00-4358-ba3f-baefd48d59d8
Lines, Jason
5d664e74-7313-445d-8099-cecb63157a2c

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

There is a long history of research into time series clustering using distance-based partitional clustering. Many of the most popular algorithms adapt k-means (also known as Lloyd's algorithm) to exploit time dependencies in the data by specifying a time series distance function. However, these algorithms are often presented with k-means configured in various ways, altering key parameters such as the initialisation strategy. This variability makes it difficult to compare studies because k-means is known to be highly sensitive to its configuration. To address this, we propose a standard Lloyd's-based model for TSCL that adopts an end-to-end approach, incorporating a specialised distance function not only in the assignment step but also in the initialisation and stopping criteria. By doing so, we create a unified structure for comparing seven popular Lloyd's-based TSCL algorithms. This common framework enables us to more easily attribute differences in clustering performance to the distance function itself, rather than variations in the k-means configuration.

Text
2410.14269v1 - Author's Original
Available under License Creative Commons Attribution.
Download (1MB)

More information

Published date: 18 October 2024
Keywords: cs.LG

Identifiers

Local EPrints ID: 498990
URI: http://eprints.soton.ac.uk/id/eprint/498990
PURE UUID: 60de8b06-e6c0-41f5-92f6-b367e94026e9
ORCID for Anthony Bagnall: ORCID iD orcid.org/0000-0003-2360-8994

Catalogue record

Date deposited: 06 Mar 2025 17:38
Last modified: 07 Mar 2025 03:08

Export record

Altmetrics

Contributors

Author: Christopher Holder
Author: Anthony Bagnall ORCID iD
Author: Jason Lines

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×