An architecture for management of large, distributed, scientific data

Papiani, Mark (2000) An architecture for management of large, distributed, scientific data. University of Southampton, Doctoral Thesis.

Record type: Thesis (Doctoral)

Abstract

This thesis describes research into Web-based management of non-traditional data. Three prototype systems are discussed, GBIS, DBbrowse and EASIA, each of which provided examples of new ideas in this area.

In 1999 concepts from GBIS and DBbrowse were used as the starting point for examining new architectures for archiving scientific datasets. Data from numerical simulations generated by the UK Turbulence Consortium was used as a case study. Due to the large datasets produced, new Web-based mechanisms were required for storage, searching, retrieval and manipulation of simulation results in the hundreds of gigabytes range. A prototype architecture and user interface, EASIA (Extensible Architecture for Scientific Data Archives) [182] [183] is described. EASIA demonstrates several new concepts of active digital libraries of scientific data. Result files are archived in-place thereby avoiding costs associated with transmitting results in a centralised site. The method used shows that a database can meet the apparently divergent requirements of storing both the relatively small simulation result metadata, and the large, distributed result files, in a unified, secure way. EASIA also shows that separation of user interface specification from user interface processing can simplify the extensibility of such systems. EASIA archives not only data in a distributed fashion, but also applications. These are loosely coupled to the archived datasets via a user interface specification file that uses a vocabulary defined by a markup language. Archived applications can provide reusable dynamic server-side post-processing operations. This can reduce bandwidth requirements for request data through server-side data reduction. The archive allows post-processing to be performed directly without the cost of having to rematerialise to files, and it also reduces access bottlenecks and processor loading at individual sites.

Text

757800.pdf - Version of Record

Available under License University of Southampton Thesis Licence.

Download (6MB)