A semi-supervised regression model for mixed numerical and categorical variables
A semi-supervised regression model for mixed numerical and categorical variables
In this paper, we develop a semi-supervised regression algorithm to analyze data sets which contain both categorical and numerical attributes. This algorithm partitions the data sets into several clusters and at the same time fits a multivariate regression model to each cluster. This framework allows one to incorporate both multivariate regression models for numerical variables (supervised learning methods) and k-mode clustering algorithms for categorical variables (unsupervised learning methods). The estimates of regression models and k-mode parameters can be obtained simultaneously by minimizing a function which is the weighted sum of the least-square errors in the multivariate regression models and the dissimilarity measures among the categorical variables. Both synthetic and real data sets are presented to demonstrate the effectiveness of the proposed method.
clustering, regression, data mining, numerical variables, categorical variables
1745-1752
Ng, Michael K.
73e40b29-3e72-4610-9b01-53100f448dfe
Chan, Elaine Y.
f28cc986-99a4-4d90-b65d-3268379981d7
So, M.C.
c6922ccf-547b-485e-8b74-a9271e6225a2
Ching, Wai-Ki
69a2f904-e8f9-488c-9fef-1b615ffd302a
1 June 2007
Ng, Michael K.
73e40b29-3e72-4610-9b01-53100f448dfe
Chan, Elaine Y.
f28cc986-99a4-4d90-b65d-3268379981d7
So, M.C.
c6922ccf-547b-485e-8b74-a9271e6225a2
Ching, Wai-Ki
69a2f904-e8f9-488c-9fef-1b615ffd302a
Ng, Michael K., Chan, Elaine Y., So, M.C. and Ching, Wai-Ki
(2007)
A semi-supervised regression model for mixed numerical and categorical variables.
Pattern Recognition, 40 (16), .
(doi:10.1016/j.patcog.2006.06.018).
Abstract
In this paper, we develop a semi-supervised regression algorithm to analyze data sets which contain both categorical and numerical attributes. This algorithm partitions the data sets into several clusters and at the same time fits a multivariate regression model to each cluster. This framework allows one to incorporate both multivariate regression models for numerical variables (supervised learning methods) and k-mode clustering algorithms for categorical variables (unsupervised learning methods). The estimates of regression models and k-mode parameters can be obtained simultaneously by minimizing a function which is the weighted sum of the least-square errors in the multivariate regression models and the dissimilarity measures among the categorical variables. Both synthetic and real data sets are presented to demonstrate the effectiveness of the proposed method.
This record has no associated files available for download.
More information
Published date: 1 June 2007
Keywords:
clustering, regression, data mining, numerical variables, categorical variables
Identifiers
Local EPrints ID: 180719
URI: http://eprints.soton.ac.uk/id/eprint/180719
ISSN: 0031-3203
PURE UUID: 47e8a94e-764f-4643-a82b-ae48832aeb21
Catalogue record
Date deposited: 13 Apr 2011 14:53
Last modified: 15 Mar 2024 03:29
Export record
Altmetrics
Contributors
Author:
Michael K. Ng
Author:
Elaine Y. Chan
Author:
Wai-Ki Ching
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics