Text Categorization via Ellipsoid Separation
Text Categorization via Ellipsoid Separation
The problem of document classification based on their semantic content (text categorization) arises when the documents from some set have to be ranked according to their relevance to some usually predefined set of topics (i.e. web search, classification of news articles based on their dealing with business topics). In this work we are going to present a new batch learning algorithm for text classification. Our method applies non-linear ellipsoid separation to the vector space representation of text documents representations. We use 'bag of words' vector representation of text documents, and maximal separation ratio method for pattern separation via ellipsoids [2] and kernel GSK algorithm [1] for feature extraction. Therefore we utilize maximization of the separation ratio and approximation of latent semantic feature extraction. We present some preliminary results which indicate high potential for the given approach.
Text categorization, ellipsoid separation, semidefinite programming, latent semantic indexing, feature extraction.
19-20
Kharechko, Andriy
9dccd719-b3fd-4ff6-9b85-b329e31cba9e
Shawe-Taylor, John
b1931d97-fdd0-4bc1-89bc-ec01648e928b
Herbrich, Ralf
3024ba7e-f3a1-4187-8655-b7f163c7c733
Graepel, Thore
f01fa538-c0f8-4e36-bbcc-698366e73f39
2004
Kharechko, Andriy
9dccd719-b3fd-4ff6-9b85-b329e31cba9e
Shawe-Taylor, John
b1931d97-fdd0-4bc1-89bc-ec01648e928b
Herbrich, Ralf
3024ba7e-f3a1-4187-8655-b7f163c7c733
Graepel, Thore
f01fa538-c0f8-4e36-bbcc-698366e73f39
Kharechko, Andriy, Shawe-Taylor, John, Herbrich, Ralf and Graepel, Thore
(2004)
Text Categorization via Ellipsoid Separation.
Postgraduate Research Conference in Electronics, Photonics, Communications & Networks, and Computing Science (PREP2004), University of Hertfordshire, Hatfield, United Kingdom.
05 - 07 Apr 2004.
.
Record type:
Conference or Workshop Item
(Poster)
Abstract
The problem of document classification based on their semantic content (text categorization) arises when the documents from some set have to be ranked according to their relevance to some usually predefined set of topics (i.e. web search, classification of news articles based on their dealing with business topics). In this work we are going to present a new batch learning algorithm for text classification. Our method applies non-linear ellipsoid separation to the vector space representation of text documents representations. We use 'bag of words' vector representation of text documents, and maximal separation ratio method for pattern separation via ellipsoids [2] and kernel GSK algorithm [1] for feature extraction. Therefore we utilize maximization of the separation ratio and approximation of latent semantic feature extraction. We present some preliminary results which indicate high potential for the given approach.
Text
prep2004.doc
- Other
More information
Published date: 2004
Additional Information:
Event Dates: 5 - 7 April 2004
Venue - Dates:
Postgraduate Research Conference in Electronics, Photonics, Communications & Networks, and Computing Science (PREP2004), University of Hertfordshire, Hatfield, United Kingdom, 2004-04-05 - 2004-04-07
Keywords:
Text categorization, ellipsoid separation, semidefinite programming, latent semantic indexing, feature extraction.
Organisations:
Electronics & Computer Science
Identifiers
Local EPrints ID: 263417
URI: http://eprints.soton.ac.uk/id/eprint/263417
PURE UUID: 7b2a2a21-ae64-4608-9152-b936e960980c
Catalogue record
Date deposited: 13 Feb 2007
Last modified: 14 Mar 2024 07:31
Export record
Contributors
Author:
Andriy Kharechko
Author:
John Shawe-Taylor
Author:
Ralf Herbrich
Author:
Thore Graepel
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics