Distributed data management for large scale applications
Distributed data management for large scale applications
Improvements in data storage and network technologies, the emergence of new highresolution scientific instruments, the widespread use of the Internet and the World Wide Web and even globalisation have contributed to the emergence of new large scale dataintensive applications.
These applications require new systems that allow users to store, share and process data across computing centres around the world. Worldwide distributed data management is particularly important when there is a lot of data, more than can fit in a single computer or even in a single data centre. Designing systems to cope with the demanding requirements of these applications is the focus of the present work.
This thesis presents four contributions. First, it introduces a set of design principles that can be used to create distributed data management systems for data-intensive applications. Second, it describes an architecture and implementation that follows the proposed design principles, and which results in a scalable, fault tolerant and secure system. Third, it presents the system evaluation, which occurred under real operational conditions using close to one hundred computing sites and with more than 14 petabytes of data. Fourth, it proposes novel algorithms to model the behaviour of file transfers on a wide-area network.
This work also presents a detailed description of the problem of managing distributed data, ranging from the collection of requirements to the identification of the uncertainty that underlies a large distributed environment. This includes a critique of existing work and the identification of practical limits to the development of transfer algorithms on a shared distributed environment.
The motivation for this work has been the ATLAS Experiment for the Large Hadron Collider (LHC) at CERN, where the author was responsible for the development of the data management middleware.
University of Southampton
de Oliveira Branco, Miguel
4e506200-44aa-4ce1-a899-266aa029ab74
November 2009
de Oliveira Branco, Miguel
4e506200-44aa-4ce1-a899-266aa029ab74
de Roure, David
02879140-3508-4db9-a7f4-d114421375da
Zaluska, Ed
43f6a989-9542-497e-bc9d-fe20f03cad35
de Oliveira Branco, Miguel
(2009)
Distributed data management for large scale applications.
University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 191pp.
Record type:
Thesis
(Doctoral)
Abstract
Improvements in data storage and network technologies, the emergence of new highresolution scientific instruments, the widespread use of the Internet and the World Wide Web and even globalisation have contributed to the emergence of new large scale dataintensive applications.
These applications require new systems that allow users to store, share and process data across computing centres around the world. Worldwide distributed data management is particularly important when there is a lot of data, more than can fit in a single computer or even in a single data centre. Designing systems to cope with the demanding requirements of these applications is the focus of the present work.
This thesis presents four contributions. First, it introduces a set of design principles that can be used to create distributed data management systems for data-intensive applications. Second, it describes an architecture and implementation that follows the proposed design principles, and which results in a scalable, fault tolerant and secure system. Third, it presents the system evaluation, which occurred under real operational conditions using close to one hundred computing sites and with more than 14 petabytes of data. Fourth, it proposes novel algorithms to model the behaviour of file transfers on a wide-area network.
This work also presents a detailed description of the problem of managing distributed data, ranging from the collection of requirements to the identification of the uncertainty that underlies a large distributed environment. This includes a critique of existing work and the identification of practical limits to the development of transfer algorithms on a shared distributed environment.
The motivation for this work has been the ATLAS Experiment for the Large Hadron Collider (LHC) at CERN, where the author was responsible for the development of the data management middleware.
Text
thesis
- Version of Record
Restricted to Repository staff only
More information
Submitted date: September 2009
Published date: November 2009
Organisations:
University of Southampton
Identifiers
Local EPrints ID: 72283
URI: http://eprints.soton.ac.uk/id/eprint/72283
PURE UUID: 0bb90ffa-3824-408f-82cc-15cfecf7a6d4
Catalogue record
Date deposited: 05 Feb 2010
Last modified: 13 Mar 2024 21:23
Export record
Contributors
Author:
Miguel de Oliveira Branco
Thesis advisor:
David de Roure
Thesis advisor:
Ed Zaluska
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics