Extracting latent structures in numerical classification: An investigation using two factor models
Extracting latent structures in numerical classification: An investigation using two factor models
We investigate the use of SVD based two factor models for numerical data classification. Motivations for such a study include the widespread success of such models (e.g, LSI) in textual information retrieval, emerging connections with well established statistical techniques and the increasing occurrence of mixed mode (text-and-numeric) data.
A direct extension as well as an efficient modification of the LSI model applied to numerical data problems are presented and the associated problems and likely remedies discussed. The techniques under investigation are shown to perform competitively with respect to popular existing numerical classification techniques on a range of synthetic and real world benchmark data. In particular, we show that the modified LSI proposed in this work avoids confronting the optimal subspace selection problem yet generalizes well and remains computationally efficient for large data.
1842-1846
Choudhury, Arindum
a043bfbe-e1b2-4fa5-abce-f24e772bb25e
Ong, YewSoon
6d0b3024-4ad1-4e17-aad1-5820eb62002b
Keane, Andy J.
26d7fa33-5415-4910-89d8-fb3620413def
2002
Choudhury, Arindum
a043bfbe-e1b2-4fa5-abce-f24e772bb25e
Ong, YewSoon
6d0b3024-4ad1-4e17-aad1-5820eb62002b
Keane, Andy J.
26d7fa33-5415-4910-89d8-fb3620413def
Choudhury, Arindum, Ong, YewSoon and Keane, Andy J.
(2002)
Extracting latent structures in numerical classification: An investigation using two factor models.
9th International Conference on Neural Information Processing. ICONIP'02, Singapore.
18 - 22 Nov 2002.
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
We investigate the use of SVD based two factor models for numerical data classification. Motivations for such a study include the widespread success of such models (e.g, LSI) in textual information retrieval, emerging connections with well established statistical techniques and the increasing occurrence of mixed mode (text-and-numeric) data.
A direct extension as well as an efficient modification of the LSI model applied to numerical data problems are presented and the associated problems and likely remedies discussed. The techniques under investigation are shown to perform competitively with respect to popular existing numerical classification techniques on a range of synthetic and real world benchmark data. In particular, we show that the modified LSI proposed in this work avoids confronting the optimal subspace selection problem yet generalizes well and remains computationally efficient for large data.
Text
chou_02a.pdf
- Accepted Manuscript
More information
Published date: 2002
Venue - Dates:
9th International Conference on Neural Information Processing. ICONIP'02, Singapore, 2002-11-18 - 2002-11-22
Identifiers
Local EPrints ID: 22259
URI: http://eprints.soton.ac.uk/id/eprint/22259
PURE UUID: dd049725-ffa0-4824-a5fd-9bb5003ae1f7
Catalogue record
Date deposited: 10 Jul 2006
Last modified: 16 Mar 2024 02:53
Export record
Contributors
Author:
Arindum Choudhury
Author:
YewSoon Ong
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics