The University of Southampton
University of Southampton Institutional Repository

A data parallel approach for large-scale Gaussian process modelling

A data parallel approach for large-scale Gaussian process modelling
A data parallel approach for large-scale Gaussian process modelling
This paper proposes an enabling data parallel local learning methodology for handling large data regression through the Gaussian Process (GP) modeling paradigm. The proposed model achieves parallelism by employing a specialized compactly supported covariance function defined over spatially localized clusters. The associated load balancing constraints arising from data parallelism are satisfied using a novel greedy clustering algorithm, GeoClust producing balanced clusters localized in space. Further, the use of the proposed covariance function as a building block for GP models is shown to decompose the maximum likelihood estimation problem into smaller decoupled subproblems. The attendant benefits which include a significant reduction in training complexity, as well as sparse predictive models for the posterior mean and variance make the present scheme extremely attractive. Experimental investigations on real and synthetic data demonstrate that the current approach can consistently outperform the state-of-the-art Bayesian Committee Machine (BCM) which employs a random data partitioning strategy. Finally, extensive evaluations over a grid-based computational infrastructure using the NetSolve distributed computing system show that the present approach scales well with data and could potentially be used in large-scale data mining applications.
95-111
Society for Industrial and Applied Mathematics
Choudhury, A.
c45433d6-df9a-4d89-b28f-59b2cdf69984
Nair, P.B.
d4d61705-bc97-478e-9e11-bcef6683afe7
Keane, A.J.
26d7fa33-5415-4910-89d8-fb3620413def
Choudhury, A.
c45433d6-df9a-4d89-b28f-59b2cdf69984
Nair, P.B.
d4d61705-bc97-478e-9e11-bcef6683afe7
Keane, A.J.
26d7fa33-5415-4910-89d8-fb3620413def

Choudhury, A., Nair, P.B. and Keane, A.J. (2002) A data parallel approach for large-scale Gaussian process modelling. In Proceedings of the Second SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics. pp. 95-111 .

Record type: Conference or Workshop Item (Paper)

Abstract

This paper proposes an enabling data parallel local learning methodology for handling large data regression through the Gaussian Process (GP) modeling paradigm. The proposed model achieves parallelism by employing a specialized compactly supported covariance function defined over spatially localized clusters. The associated load balancing constraints arising from data parallelism are satisfied using a novel greedy clustering algorithm, GeoClust producing balanced clusters localized in space. Further, the use of the proposed covariance function as a building block for GP models is shown to decompose the maximum likelihood estimation problem into smaller decoupled subproblems. The attendant benefits which include a significant reduction in training complexity, as well as sparse predictive models for the posterior mean and variance make the present scheme extremely attractive. Experimental investigations on real and synthetic data demonstrate that the current approach can consistently outperform the state-of-the-art Bayesian Committee Machine (BCM) which employs a random data partitioning strategy. Finally, extensive evaluations over a grid-based computational infrastructure using the NetSolve distributed computing system show that the present approach scales well with data and could potentially be used in large-scale data mining applications.

Text
chou_02.pdf - Accepted Manuscript
Download (1MB)

More information

Published date: 2002
Venue - Dates: Second SIAM International Conference on Data Mining, Arlington, VA, 2002-04-01 - 2002-04-01

Identifiers

Local EPrints ID: 21993
URI: http://eprints.soton.ac.uk/id/eprint/21993
PURE UUID: d64319e9-95d2-4986-bc46-6a0d8358e835
ORCID for A.J. Keane: ORCID iD orcid.org/0000-0001-7993-1569

Catalogue record

Date deposited: 29 Mar 2006
Last modified: 16 Mar 2024 02:53

Export record

Contributors

Author: A. Choudhury
Author: P.B. Nair
Author: A.J. Keane ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×