The University of Southampton
University of Southampton Institutional Repository

Using KCCA for Japanese-English cross-language information retrieval and classification

Using KCCA for Japanese-English cross-language information retrieval and classification
Using KCCA for Japanese-English cross-language information retrieval and classification
Kernel Canonical Correlation Analysis (KCCA) is a method of correlating linear relationship between two variables in a kernel defined feature space. A machine learning algorithm based on KCCA is studied for cross-language information retrieval. We apply the algorithm in Japanese-English cross-language information retrieval. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. Computational complexity is an important issue when applying KCCA to large dataset as in information retrieval. We experimentally evaluate several methods to alleviate the problem of applying KCCA to large datasets. We also investigate cross-language document classification using KCCA as well as other methods. Our results show that it is feasible to use a classifier learned in one language to classify the documents in other languages.
KCCA cross-language information retrieval algorithm Japanese English kernel
Li, Yaoyong
073211dd-f160-4e2b-b09a-a170d865140d
Shawe-Taylor, John
b1931d97-fdd0-4bc1-89bc-ec01648e928b
Li, Yaoyong
073211dd-f160-4e2b-b09a-a170d865140d
Shawe-Taylor, John
b1931d97-fdd0-4bc1-89bc-ec01648e928b

Li, Yaoyong and Shawe-Taylor, John (2005) Using KCCA for Japanese-English cross-language information retrieval and classification. Journal of Intelligent Information Systems, tba (tba).

Record type: Article

Abstract

Kernel Canonical Correlation Analysis (KCCA) is a method of correlating linear relationship between two variables in a kernel defined feature space. A machine learning algorithm based on KCCA is studied for cross-language information retrieval. We apply the algorithm in Japanese-English cross-language information retrieval. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. Computational complexity is an important issue when applying KCCA to large dataset as in information retrieval. We experimentally evaluate several methods to alleviate the problem of applying KCCA to large datasets. We also investigate cross-language document classification using KCCA as well as other methods. Our results show that it is feasible to use a classifier learned in one language to classify the documents in other languages.

Text
kcca2005.pdf - Other
Download (179kB)

More information

Published date: 2005
Keywords: KCCA cross-language information retrieval algorithm Japanese English kernel
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 260786
URI: http://eprints.soton.ac.uk/id/eprint/260786
PURE UUID: 99aeedd9-0817-4d8d-b1b5-85fccd345504

Catalogue record

Date deposited: 21 Apr 2005
Last modified: 14 Mar 2024 06:43

Export record

Contributors

Author: Yaoyong Li
Author: John Shawe-Taylor

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×