The University of Southampton
University of Southampton Institutional Repository

An application of the nearest correlation matrix to Web document classification

An application of the nearest correlation matrix to Web document classification
An application of the nearest correlation matrix to Web document classification
The Web document is organized by a set of textual data according to a predefined logical structure. It has been shown that collecting Web documents with similar structures can improve query efficiency. The XML document has no vectorial representation, which is required in most existing classification algorithms. The kernel method has been applied to represent structural data with pairwise similarity. In this case, a set of Web data can be fed into classification algorithms in the format of a kernel matrix. However, since the distance between a pair of Web documents is usually obtained approximately, the derived distance matrix is not a kernel matrix. In this paper, we propose to use the nearest correlation matrix (of the estimated distance matrix) as the kernel matrix, which can be fast computed by a Newton-type method. Experimental studies show that the classification accuracy can be significantly improved.
support vector machines, classification, kernel matrix, semidefinite programming.
701-713
Qi, Hou-Duo
e9789eb9-c2bc-4b63-9acb-c7e753cc9a85
Xia, Zhonghang
f59c1e13-40fb-44d9-bcaf-31643ccc6637
Xing, Guangming
3fb034a9-d19c-4531-9735-33967178e72c
Qi, Hou-Duo
e9789eb9-c2bc-4b63-9acb-c7e753cc9a85
Xia, Zhonghang
f59c1e13-40fb-44d9-bcaf-31643ccc6637
Xing, Guangming
3fb034a9-d19c-4531-9735-33967178e72c

Qi, Hou-Duo, Xia, Zhonghang and Xing, Guangming (2007) An application of the nearest correlation matrix to Web document classification. Journal of Industrial Management and Optimization, 3 (4), 701-713.

Record type: Article

Abstract

The Web document is organized by a set of textual data according to a predefined logical structure. It has been shown that collecting Web documents with similar structures can improve query efficiency. The XML document has no vectorial representation, which is required in most existing classification algorithms. The kernel method has been applied to represent structural data with pairwise similarity. In this case, a set of Web data can be fed into classification algorithms in the format of a kernel matrix. However, since the distance between a pair of Web documents is usually obtained approximately, the derived distance matrix is not a kernel matrix. In this paper, we propose to use the nearest correlation matrix (of the estimated distance matrix) as the kernel matrix, which can be fast computed by a Newton-type method. Experimental studies show that the classification accuracy can be significantly improved.

This record has no associated files available for download.

More information

Published date: November 2007
Keywords: support vector machines, classification, kernel matrix, semidefinite programming.
Organisations: Operational Research

Identifiers

Local EPrints ID: 54536
URI: http://eprints.soton.ac.uk/id/eprint/54536
PURE UUID: 4b975279-ab8d-47b6-974b-ed2fee300bc7
ORCID for Hou-Duo Qi: ORCID iD orcid.org/0000-0003-3481-4814

Catalogue record

Date deposited: 28 Jul 2008
Last modified: 09 Jan 2022 03:17

Export record

Contributors

Author: Hou-Duo Qi ORCID iD
Author: Zhonghang Xia
Author: Guangming Xing

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×