The University of Southampton
University of Southampton Institutional Repository

Distributed data management for large scale applications

Distributed data management for large scale applications
Distributed data management for large scale applications
Improvements in data storage and network technologies, the emergence of new highresolution scientific instruments, the widespread use of the Internet and the World Wide Web and even globalisation have contributed to the emergence of new large scale dataintensive applications.

These applications require new systems that allow users to store, share and process data across computing centres around the world. Worldwide distributed data management is particularly important when there is a lot of data, more than can fit in a single computer or even in a single data centre. Designing systems to cope with the demanding requirements of these applications is the focus of the present work.

This thesis presents four contributions. First, it introduces a set of design principles that can be used to create distributed data management systems for data-intensive applications. Second, it describes an architecture and implementation that follows the proposed design principles, and which results in a scalable, fault tolerant and secure system. Third, it presents the system evaluation, which occurred under real operational conditions using close to one hundred computing sites and with more than 14 petabytes of data. Fourth, it proposes novel algorithms to model the behaviour of file transfers on a wide-area network.

This work also presents a detailed description of the problem of managing distributed data, ranging from the collection of requirements to the identification of the uncertainty that underlies a large distributed environment. This includes a critique of existing work and the identification of practical limits to the development of transfer algorithms on a shared distributed environment.

The motivation for this work has been the ATLAS Experiment for the Large Hadron Collider (LHC) at CERN, where the author was responsible for the development of the data management middleware.
University of Southampton
de Oliveira Branco, Miguel
4e506200-44aa-4ce1-a899-266aa029ab74
de Oliveira Branco, Miguel
4e506200-44aa-4ce1-a899-266aa029ab74
de Roure, David
02879140-3508-4db9-a7f4-d114421375da
Zaluska, Ed
43f6a989-9542-497e-bc9d-fe20f03cad35

de Oliveira Branco, Miguel (2009) Distributed data management for large scale applications. University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 191pp.

Record type: Thesis (Doctoral)

Abstract

Improvements in data storage and network technologies, the emergence of new highresolution scientific instruments, the widespread use of the Internet and the World Wide Web and even globalisation have contributed to the emergence of new large scale dataintensive applications.

These applications require new systems that allow users to store, share and process data across computing centres around the world. Worldwide distributed data management is particularly important when there is a lot of data, more than can fit in a single computer or even in a single data centre. Designing systems to cope with the demanding requirements of these applications is the focus of the present work.

This thesis presents four contributions. First, it introduces a set of design principles that can be used to create distributed data management systems for data-intensive applications. Second, it describes an architecture and implementation that follows the proposed design principles, and which results in a scalable, fault tolerant and secure system. Third, it presents the system evaluation, which occurred under real operational conditions using close to one hundred computing sites and with more than 14 petabytes of data. Fourth, it proposes novel algorithms to model the behaviour of file transfers on a wide-area network.

This work also presents a detailed description of the problem of managing distributed data, ranging from the collection of requirements to the identification of the uncertainty that underlies a large distributed environment. This includes a critique of existing work and the identification of practical limits to the development of transfer algorithms on a shared distributed environment.

The motivation for this work has been the ATLAS Experiment for the Large Hadron Collider (LHC) at CERN, where the author was responsible for the development of the data management middleware.

Text
thesis - Version of Record
Available under License University of Southampton Thesis Licence.
Download (3MB)
Text
draft.pdf - Other
Restricted to Repository staff only

More information

Submitted date: September 2009
Published date: November 2009
Organisations: University of Southampton

Identifiers

Local EPrints ID: 72283
URI: http://eprints.soton.ac.uk/id/eprint/72283
PURE UUID: 0bb90ffa-3824-408f-82cc-15cfecf7a6d4
ORCID for David de Roure: ORCID iD orcid.org/0000-0001-9074-3016

Catalogue record

Date deposited: 05 Feb 2010
Last modified: 13 Mar 2024 21:23

Export record

Contributors

Author: Miguel de Oliveira Branco
Thesis advisor: David de Roure ORCID iD
Thesis advisor: Ed Zaluska

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×