Distributed Data Management for Large Scale Applications

(2009) Distributed Data Management for Large Scale Applications University of Southampton, IAM/ECS, Doctoral Thesis .


[img] PDF UNSPECIFIED - Draft Version
Restricted to Registered users only

Download (3MB)
[img] Indexer Terms UNSPECIFIED
Restricted to Registered users only

Download (40kB)
Restricted to Registered users only

Download (3MB)


Improvements in data storage and network technologies, the emergence of new high-resolution scientific instruments, the widespread use of the Internet and the World Wide Web and even globalisation have contributed to the emergence of new large scale data-intensive applications. These applications require new systems that allow users to store, share and process data across computing centres around the world. Worldwide distributed data management is particularly important when there is a lot of data, more than can fit in a single computer or even in a single data centre. Designing systems to cope with the demanding requirements of these applications is the focus of the present work. This thesis presents four contributions. First, it introduces a set of design principles that can be used to create distributed data management systems for data-intensive applications. Second, it describes an architecture and implementation that follows the proposed design principles, and which results in a scalable, fault tolerant and secure system. Third, it presents the system evaluation, which occurred under real operational conditions using close to one hundred computing sites and with more than 14 petabytes of data. Fourth, it proposes novel algorithms to model the behaviour of file transfers on a wide-area network. This work also presents a detailed description of the problem of managing distributed data, ranging from the collection of requirements to the identification of the uncertainty that underlies a large distributed environment. This includes a critique of existing work and the identification of practical limits to the development of transfer algorithms on a shared distributed environment. The motivation for this work has been the ATLAS Experiment for the Large Hadron Collider (LHC) at CERN, where the author was responsible for the development of the data management middleware.

Item Type: Thesis (Doctoral)
ePrint ID: 267994
Date Deposited: 05 Oct 2009 13:28
Last Modified: 27 Mar 2014 20:14
Further Information:Google Scholar
URI: http://eprints.soton.ac.uk/id/eprint/267994

Actions (login required)

View Item View Item