Long term data integrity for large Audiovisual archives.
At 2010 Joint Technical Symposium (JTS) at the 66th Congress of the International Federation of Film Archives (FIAF), Oslo, Norway,
02 - 05 May 2010.
In the broadcast and wider AV industry, digital file-based audiovisual archives are rapidly becoming embedded services within networked infrastructures and content-centric production and distribution processes. Online (network accessible) and long-term storage of digital content based on commodity IT technology (e.g. disk-servers and tape-robots) is an increasingly common approach, including conventional IT solutions for safety, e.g. backup and disaster recovery. But are these solutions safe? Can they assure the data integrity needed for long-term preservation of Petabyte volumes of data? The answer is no. Field studies, e.g. by CERN and NetApp, reveal that data corruption can take place silently without detection or correction including in 'enterprise class' systems explicitly designed to prevent data loss. We address this problem in the UK TSB supported AVATAR-m and EC supported PrestoPrime projects. Our approach accepts that loss does occur in storage and that archivists need tools to understand and manage this loss. Recent work, not yet presented elsewhere, includes new video encoding techniques (based on the BBC’s Dirac codec) that make video files more robust to data corruption, including the ability to degrade gracefully so content remains useable (as it used to be in the analogue world), along with simulation and modeling work that helps archives understand the risks of using IT storage and what approaches to take (e.g. how many copies to make, how often to check them, what encoding to use, what the costs will be and what losses could happen). In particular, we take into account the sensitivity of the specific data formats used for AV preservation to the various failure modes of the technology used to storage them. These new tools and techniques form part of our framework for implementing different preservation and integrity management strategies. Policy-based replication of content is used across multiple, distributed and heterogeneous storage locations to provide control over how many copies to make, where to put them and what file-formats to use. Automated integrity checking and repair is used to check for corruption. Large AV assets are deconstructed into smaller files, each of which can have different preservation policies applied to them. This allows differential strategies to be used, e.g. for the audio, video and metadata components of an MXF object, depending on the relative needs of each part of the asset for safety, accessibility, longevity We propose a presentation at JTS2010 to disseminate our work, including highlighting the risks of using IT storage for audiovisual preservation and the tools and techniques available to estimate, quantify, monitor and manage this worrying problem.
Actions (login required)