The University of Southampton
University of Southampton Institutional Repository

Text Categorization via Ellipsoid Separation

Text Categorization via Ellipsoid Separation
Text Categorization via Ellipsoid Separation
The problem of document classification based on their semantic content (text categorization) arises when the documents from some set have to be ranked according to their relevance to some usually predefined set of topics (i.e. web search, classification of news articles based on their dealing with business topics). In this work we are going to present a new batch learning algorithm for text classification. Our method applies non-linear ellipsoid separation to the vector space representation of text documents representations. We use 'bag of words' vector representation of text documents, and maximal separation ratio method for pattern separation via ellipsoids [2] and kernel GSK algorithm [1] for feature extraction. Therefore we utilize maximization of the separation ratio and approximation of latent semantic feature extraction. We present some preliminary results which indicate high potential for the given approach.
Text categorization, ellipsoid separation, semidefinite programming, latent semantic indexing, feature extraction.
19-20
Kharechko, Andriy
9dccd719-b3fd-4ff6-9b85-b329e31cba9e
Shawe-Taylor, John
b1931d97-fdd0-4bc1-89bc-ec01648e928b
Herbrich, Ralf
3024ba7e-f3a1-4187-8655-b7f163c7c733
Graepel, Thore
f01fa538-c0f8-4e36-bbcc-698366e73f39
Kharechko, Andriy
9dccd719-b3fd-4ff6-9b85-b329e31cba9e
Shawe-Taylor, John
b1931d97-fdd0-4bc1-89bc-ec01648e928b
Herbrich, Ralf
3024ba7e-f3a1-4187-8655-b7f163c7c733
Graepel, Thore
f01fa538-c0f8-4e36-bbcc-698366e73f39

Kharechko, Andriy, Shawe-Taylor, John, Herbrich, Ralf and Graepel, Thore (2004) Text Categorization via Ellipsoid Separation. Postgraduate Research Conference in Electronics, Photonics, Communications & Networks, and Computing Science (PREP2004), University of Hertfordshire, Hatfield, United Kingdom. 05 - 07 Apr 2004. pp. 19-20 .

Record type: Conference or Workshop Item (Poster)

Abstract

The problem of document classification based on their semantic content (text categorization) arises when the documents from some set have to be ranked according to their relevance to some usually predefined set of topics (i.e. web search, classification of news articles based on their dealing with business topics). In this work we are going to present a new batch learning algorithm for text classification. Our method applies non-linear ellipsoid separation to the vector space representation of text documents representations. We use 'bag of words' vector representation of text documents, and maximal separation ratio method for pattern separation via ellipsoids [2] and kernel GSK algorithm [1] for feature extraction. Therefore we utilize maximization of the separation ratio and approximation of latent semantic feature extraction. We present some preliminary results which indicate high potential for the given approach.

Text
prep2004.doc - Other
Download (52kB)

More information

Published date: 2004
Additional Information: Event Dates: 5 - 7 April 2004
Venue - Dates: Postgraduate Research Conference in Electronics, Photonics, Communications & Networks, and Computing Science (PREP2004), University of Hertfordshire, Hatfield, United Kingdom, 2004-04-05 - 2004-04-07
Keywords: Text categorization, ellipsoid separation, semidefinite programming, latent semantic indexing, feature extraction.
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 263417
URI: http://eprints.soton.ac.uk/id/eprint/263417
PURE UUID: 7b2a2a21-ae64-4608-9152-b936e960980c

Catalogue record

Date deposited: 13 Feb 2007
Last modified: 14 Mar 2024 07:31

Export record

Contributors

Author: Andriy Kharechko
Author: John Shawe-Taylor
Author: Ralf Herbrich
Author: Thore Graepel

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×