Semi-supervised constrained clustering with cluster outlier filtering
Semi-supervised constrained clustering with cluster outlier filtering
Constrained clustering addresses the problem of creating minimum variance clusters with the added complexity that there is a set of constraints that must be fulfilled by the elements in the cluster. Research in this area has focused on “must-link” and “cannot-link” constraints, in which pairs of elements must be in the same or in different clusters, respectively. In this work we present a heuristic procedure to perform clustering in two classes when the restrictions affect all the elements of the two clusters in such a way that they depend on the elements present in the cluster. This problem is highly susceptible to outliers in each cluster (extreme values that create infeasible solutions), so the procedure eliminates elements with extreme values in both clusters, and achieves adequate performance measures at the same time. The experiments performed on a company database allow to discover a great deal of information, with results that are more readily interpretable when compared to classical k-means clustering
347-354
Bravo, Cristian
b22c4145-644e-40ee-85d8-431c59c3c71b
Weber, Richard
da9918d6-bc84-4c98-8ffe-2aaf7b58cf1b
San Martin, Cesar
c4ee8d1f-ee88-47c9-bdc0-216a37d33ba4
Kim, Sang-Woon
0e60a735-7917-4417-98cd-ca03744afe24
2011
Bravo, Cristian
b22c4145-644e-40ee-85d8-431c59c3c71b
Weber, Richard
da9918d6-bc84-4c98-8ffe-2aaf7b58cf1b
San Martin, Cesar
c4ee8d1f-ee88-47c9-bdc0-216a37d33ba4
Kim, Sang-Woon
0e60a735-7917-4417-98cd-ca03744afe24
Bravo, Cristian and Weber, Richard
,
San Martin, Cesar and Kim, Sang-Woon
(eds.)
(2011)
Semi-supervised constrained clustering with cluster outlier filtering.
Lecture Notes in Computer Science, 7042, .
(doi:10.1007/978-3-642-25085-9_41).
Abstract
Constrained clustering addresses the problem of creating minimum variance clusters with the added complexity that there is a set of constraints that must be fulfilled by the elements in the cluster. Research in this area has focused on “must-link” and “cannot-link” constraints, in which pairs of elements must be in the same or in different clusters, respectively. In this work we present a heuristic procedure to perform clustering in two classes when the restrictions affect all the elements of the two clusters in such a way that they depend on the elements present in the cluster. This problem is highly susceptible to outliers in each cluster (extreme values that create infeasible solutions), so the procedure eliminates elements with extreme values in both clusters, and achieves adequate performance measures at the same time. The experiments performed on a company database allow to discover a great deal of information, with results that are more readily interpretable when compared to classical k-means clustering
This record has no associated files available for download.
More information
Published date: 2011
Organisations:
Southampton Business School
Identifiers
Local EPrints ID: 396681
URI: http://eprints.soton.ac.uk/id/eprint/396681
ISSN: 0302-9743
PURE UUID: 69121f0e-66fa-4dd6-b2e8-32217d6ff75a
Catalogue record
Date deposited: 10 Jun 2016 10:29
Last modified: 15 Mar 2024 03:33
Export record
Altmetrics
Contributors
Author:
Richard Weber
Editor:
Cesar San Martin
Editor:
Sang-Woon Kim
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics