A semi-supervised regression model for mixed numerical and categorical variables


Ng, Michael K., Chan, Elaine Y., So, M.C. and Ching, Wai-Ki (2007) A semi-supervised regression model for mixed numerical and categorical variables. Pattern Recognition, 40, (16), 1745-1752. (doi:10.1016/j.patcog.2006.06.018).

Download

Full text not available from this repository.

Description/Abstract

In this paper, we develop a semi-supervised regression algorithm to analyze data sets which contain both categorical and numerical attributes. This algorithm partitions the data sets into several clusters and at the same time fits a multivariate regression model to each cluster. This framework allows one to incorporate both multivariate regression models for numerical variables (supervised learning methods) and k-mode clustering algorithms for categorical variables (unsupervised learning methods). The estimates of regression models and k-mode parameters can be obtained simultaneously by minimizing a function which is the weighted sum of the least-square errors in the multivariate regression models and the dissimilarity measures among the categorical variables. Both synthetic and real data sets are presented to demonstrate the effectiveness of the proposed method.

Item Type: Article
ISSNs: 0031-3203 (print)
Keywords: clustering, regression, data mining, numerical variables, categorical variables
Subjects: H Social Sciences > HA Statistics
H Social Sciences > HD Industries. Land use. Labor > HD28 Management. Industrial Management
Q Science > QA Mathematics
Divisions: University Structure - Pre August 2011 > School of Management
ePrint ID: 180719
Date Deposited: 13 Apr 2011 14:53
Last Modified: 27 Mar 2014 19:34
URI: http://eprints.soton.ac.uk/id/eprint/180719

Actions (login required)

View Item View Item