The University of Southampton
University of Southampton Institutional Repository

An architecture for management of large, distributed, scientific data

An architecture for management of large, distributed, scientific data
An architecture for management of large, distributed, scientific data

This thesis describes research into Web-based management of non-traditional data. Three prototype systems are discussed, GBIS, DBbrowse and EASIA, each of which provided examples of new ideas in this area.

In 1999 concepts from GBIS and DBbrowse were used as the starting point for examining new architectures for archiving scientific datasets. Data from numerical simulations generated by the UK Turbulence Consortium was used as a case study. Due to the large datasets produced, new Web-based mechanisms were required for storage, searching, retrieval and manipulation of simulation results in the hundreds of gigabytes range. A prototype architecture and user interface, EASIA (Extensible Architecture for Scientific Data Archives) [182] [183] is described. EASIA demonstrates several new concepts of active digital libraries of scientific data. Result files are archived in-place thereby avoiding costs associated with transmitting results in a centralised site. The method used shows that a database can meet the apparently divergent requirements of storing both the relatively small simulation result metadata, and the large, distributed result files, in a unified, secure way. EASIA also shows that separation of user interface specification from user interface processing can simplify the extensibility of such systems. EASIA archives not only data in a distributed fashion, but also applications. These are loosely coupled to the archived datasets via a user interface specification file that uses a vocabulary defined by a markup language. Archived applications can provide reusable dynamic server-side post-processing operations. This can reduce bandwidth requirements for request data through server-side data reduction. The archive allows post-processing to be performed directly without the cost of having to rematerialise to files, and it also reduces access bottlenecks and processor loading at individual sites.

University of Southampton
Papiani, Mark
14debc02-788b-4009-ac89-5883fe5fc606
Papiani, Mark
14debc02-788b-4009-ac89-5883fe5fc606

Papiani, Mark (2000) An architecture for management of large, distributed, scientific data. University of Southampton, Doctoral Thesis.

Record type: Thesis (Doctoral)

Abstract

This thesis describes research into Web-based management of non-traditional data. Three prototype systems are discussed, GBIS, DBbrowse and EASIA, each of which provided examples of new ideas in this area.

In 1999 concepts from GBIS and DBbrowse were used as the starting point for examining new architectures for archiving scientific datasets. Data from numerical simulations generated by the UK Turbulence Consortium was used as a case study. Due to the large datasets produced, new Web-based mechanisms were required for storage, searching, retrieval and manipulation of simulation results in the hundreds of gigabytes range. A prototype architecture and user interface, EASIA (Extensible Architecture for Scientific Data Archives) [182] [183] is described. EASIA demonstrates several new concepts of active digital libraries of scientific data. Result files are archived in-place thereby avoiding costs associated with transmitting results in a centralised site. The method used shows that a database can meet the apparently divergent requirements of storing both the relatively small simulation result metadata, and the large, distributed result files, in a unified, secure way. EASIA also shows that separation of user interface specification from user interface processing can simplify the extensibility of such systems. EASIA archives not only data in a distributed fashion, but also applications. These are loosely coupled to the archived datasets via a user interface specification file that uses a vocabulary defined by a markup language. Archived applications can provide reusable dynamic server-side post-processing operations. This can reduce bandwidth requirements for request data through server-side data reduction. The archive allows post-processing to be performed directly without the cost of having to rematerialise to files, and it also reduces access bottlenecks and processor loading at individual sites.

Text
757800.pdf - Version of Record
Available under License University of Southampton Thesis Licence.
Download (6MB)

More information

Published date: 2000

Identifiers

Local EPrints ID: 464197
URI: http://eprints.soton.ac.uk/id/eprint/464197
PURE UUID: af00f06b-8011-45a7-ab91-2fab65a02a19

Catalogue record

Date deposited: 04 Jul 2022 21:32
Last modified: 16 Mar 2024 19:20

Export record

Contributors

Author: Mark Papiani

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×