On time series clustering with k-means
On time series clustering with k-means
There is a long history of research into time series clustering using distance-based partitional clustering. Many of the most popular algorithms adapt k-means (also known as Lloyd's algorithm) to exploit time dependencies in the data by specifying a time series distance function. However, these algorithms are often presented with k-means configured in various ways, altering key parameters such as the initialisation strategy. This variability makes it difficult to compare studies because k-means is known to be highly sensitive to its configuration. To address this, we propose a standard Lloyd's-based model for TSCL that adopts an end-to-end approach, incorporating a specialised distance function not only in the assignment step but also in the initialisation and stopping criteria. By doing so, we create a unified structure for comparing seven popular Lloyd's-based TSCL algorithms. This common framework enables us to more easily attribute differences in clustering performance to the distance function itself, rather than variations in the k-means configuration.
cs.LG
Holder, Christopher
fb345cc6-00fa-4256-80ba-a8d3cbdb768b
Bagnall, Anthony
d31e6506-2a00-4358-ba3f-baefd48d59d8
Lines, Jason
5d664e74-7313-445d-8099-cecb63157a2c
18 October 2024
Holder, Christopher
fb345cc6-00fa-4256-80ba-a8d3cbdb768b
Bagnall, Anthony
d31e6506-2a00-4358-ba3f-baefd48d59d8
Lines, Jason
5d664e74-7313-445d-8099-cecb63157a2c
[Unknown type: UNSPECIFIED]
Abstract
There is a long history of research into time series clustering using distance-based partitional clustering. Many of the most popular algorithms adapt k-means (also known as Lloyd's algorithm) to exploit time dependencies in the data by specifying a time series distance function. However, these algorithms are often presented with k-means configured in various ways, altering key parameters such as the initialisation strategy. This variability makes it difficult to compare studies because k-means is known to be highly sensitive to its configuration. To address this, we propose a standard Lloyd's-based model for TSCL that adopts an end-to-end approach, incorporating a specialised distance function not only in the assignment step but also in the initialisation and stopping criteria. By doing so, we create a unified structure for comparing seven popular Lloyd's-based TSCL algorithms. This common framework enables us to more easily attribute differences in clustering performance to the distance function itself, rather than variations in the k-means configuration.
Text
2410.14269v1
- Author's Original
More information
Published date: 18 October 2024
Keywords:
cs.LG
Identifiers
Local EPrints ID: 498990
URI: http://eprints.soton.ac.uk/id/eprint/498990
PURE UUID: 60de8b06-e6c0-41f5-92f6-b367e94026e9
Catalogue record
Date deposited: 06 Mar 2025 17:38
Last modified: 07 Mar 2025 03:08
Export record
Altmetrics
Contributors
Author:
Christopher Holder
Author:
Anthony Bagnall
Author:
Jason Lines
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics