An architecture for management of large, distributed, scientific data
An architecture for management of large, distributed, scientific data
This thesis describes research into Web-based management of non-traditional data. Three prototype systems are discussed, GBIS, DBbrowse and EASIA, each of which provided examples of new ideas in this area.
In 1999 concepts from GBIS and DBbrowse were used as the starting point for examining new architectures for archiving scientific datasets. Data from numerical simulations generated by the UK Turbulence Consortium was used as a case study. Due to the large datasets produced, new Web-based mechanisms were required for storage, searching, retrieval and manipulation of simulation results in the hundreds of gigabytes range. A prototype architecture and user interface, EASIA (Extensible Architecture for Scientific Data Archives) [182] [183] is described. EASIA demonstrates several new concepts of active digital libraries of scientific data. Result files are archived in-place thereby avoiding costs associated with transmitting results in a centralised site. The method used shows that a database can meet the apparently divergent requirements of storing both the relatively small simulation result metadata, and the large, distributed result files, in a unified, secure way. EASIA also shows that separation of user interface specification from user interface processing can simplify the extensibility of such systems. EASIA archives not only data in a distributed fashion, but also applications. These are loosely coupled to the archived datasets via a user interface specification file that uses a vocabulary defined by a markup language. Archived applications can provide reusable dynamic server-side post-processing operations. This can reduce bandwidth requirements for request data through server-side data reduction. The archive allows post-processing to be performed directly without the cost of having to rematerialise to files, and it also reduces access bottlenecks and processor loading at individual sites.
University of Southampton
Papiani, Mark
14debc02-788b-4009-ac89-5883fe5fc606
2000
Papiani, Mark
14debc02-788b-4009-ac89-5883fe5fc606
Papiani, Mark
(2000)
An architecture for management of large, distributed, scientific data.
University of Southampton, Doctoral Thesis.
Record type:
Thesis
(Doctoral)
Abstract
This thesis describes research into Web-based management of non-traditional data. Three prototype systems are discussed, GBIS, DBbrowse and EASIA, each of which provided examples of new ideas in this area.
In 1999 concepts from GBIS and DBbrowse were used as the starting point for examining new architectures for archiving scientific datasets. Data from numerical simulations generated by the UK Turbulence Consortium was used as a case study. Due to the large datasets produced, new Web-based mechanisms were required for storage, searching, retrieval and manipulation of simulation results in the hundreds of gigabytes range. A prototype architecture and user interface, EASIA (Extensible Architecture for Scientific Data Archives) [182] [183] is described. EASIA demonstrates several new concepts of active digital libraries of scientific data. Result files are archived in-place thereby avoiding costs associated with transmitting results in a centralised site. The method used shows that a database can meet the apparently divergent requirements of storing both the relatively small simulation result metadata, and the large, distributed result files, in a unified, secure way. EASIA also shows that separation of user interface specification from user interface processing can simplify the extensibility of such systems. EASIA archives not only data in a distributed fashion, but also applications. These are loosely coupled to the archived datasets via a user interface specification file that uses a vocabulary defined by a markup language. Archived applications can provide reusable dynamic server-side post-processing operations. This can reduce bandwidth requirements for request data through server-side data reduction. The archive allows post-processing to be performed directly without the cost of having to rematerialise to files, and it also reduces access bottlenecks and processor loading at individual sites.
Text
757800.pdf
- Version of Record
More information
Published date: 2000
Identifiers
Local EPrints ID: 464197
URI: http://eprints.soton.ac.uk/id/eprint/464197
PURE UUID: af00f06b-8011-45a7-ab91-2fab65a02a19
Catalogue record
Date deposited: 04 Jul 2022 21:32
Last modified: 16 Mar 2024 19:20
Export record
Contributors
Author:
Mark Papiani
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics