Text Categorization via Ellipsoid Separation

The problem of document classification based on their semantic content (text categorization) arises when the documents from some set have to be ranked according to their relevance to some usually predefined set of topics (i.e. web search, classification of news articles based on their dealing with business topics). In this work we are going to present a new batch learning algorithm for text classification. Our method applies non-linear ellipsoid separation to the vector space representation of text documents representations. We use 'bag of words' vector representation of text documents, and maximal separation ratio method for pattern separation via ellipsoids [2] and kernel GSK algorithm [1] for feature extraction. Therefore we utilize maximization of the separation ratio and approximation of latent semantic feature extraction. We present some preliminary results which indicate high potential for the given approach.

Text categorization, ellipsoid separation, semidefinite programming, latent semantic indexing, feature extraction.

19-20

Kharechko, Andriy

9dccd719-b3fd-4ff6-9b85-b329e31cba9e

Shawe-Taylor, John

b1931d97-fdd0-4bc1-89bc-ec01648e928b

Herbrich, Ralf

3024ba7e-f3a1-4187-8655-b7f163c7c733

Graepel, Thore

f01fa538-c0f8-4e36-bbcc-698366e73f39

2004

Kharechko, Andriy

9dccd719-b3fd-4ff6-9b85-b329e31cba9e

Shawe-Taylor, John

b1931d97-fdd0-4bc1-89bc-ec01648e928b

Herbrich, Ralf

3024ba7e-f3a1-4187-8655-b7f163c7c733

Graepel, Thore

f01fa538-c0f8-4e36-bbcc-698366e73f39

Kharechko, Andriy, Shawe-Taylor, John, Herbrich, Ralf and Graepel, Thore (2004) Text Categorization via Ellipsoid Separation. Postgraduate Research Conference in Electronics, Photonics, Communications & Networks, and Computing Science (PREP2004), University of Hertfordshire, Hatfield, United Kingdom. 05 - 07 Apr 2004. pp. 19-20 .

Record type: Conference or Workshop Item (Poster)

Abstract

Text

prep2004.doc - Other

Download (52kB)

More information

Published date: 2004

Additional Information: Event Dates: 5 - 7 April 2004

Venue - Dates: Postgraduate Research Conference in Electronics, Photonics, Communications & Networks, and Computing Science (PREP2004), University of Hertfordshire, Hatfield, United Kingdom, 2004-04-05 - 2004-04-07

Keywords: Text categorization, ellipsoid separation, semidefinite programming, latent semantic indexing, feature extraction.

Organisations: Electronics & Computer Science

Learn more about the Electronics & Computer Science