The University of Southampton
University of Southampton Institutional Repository

The Lifecycle of a Digital Historical Document: Structure and Content

The Lifecycle of a Digital Historical Document: Structure and Content
The Lifecycle of a Digital Historical Document: Structure and Content
This paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final reconstitution as an electronic document (combining content and semantic information) along with the tools that have been created to realise each stage in the lifecycle. The whole approach is described in the context of different types of typewritten documents relating to prisoners in World-War II concentration camps and is the result of a multinational collaboration under the MEMORIAL project funded (€1.5M) by the European Union (www.memorial-project.info). Extensive tests with historians/archivists and evaluation of the content extraction results indicate the superior performance of the whole semantics-driven approach both over manual transcription and over the semi-automated application of off-the-shelf OCR and the use of a conventional (text and layout) document format.
Historical documents, digital libraries, document engineering, document architecture, text enhancement, document analysis
147-154
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
Krawczyk, Henryk
1ca864ed-db7c-4f52-a75d-ac1e5675e7ad
Wiszniewski, Bogdan
79c84899-4e1b-42e9-b201-14d0ff1716bc
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
Krawczyk, Henryk
1ca864ed-db7c-4f52-a75d-ac1e5675e7ad
Wiszniewski, Bogdan
79c84899-4e1b-42e9-b201-14d0ff1716bc

Antonacopoulos, Apostolos, Karatzas, Dimosthenis, Krawczyk, Henryk and Wiszniewski, Bogdan (2004) The Lifecycle of a Digital Historical Document: Structure and Content. ACM Symposium on Document Engineering (DocEng2004), Milwaukee, Wisconsin, United States. 28 - 30 Oct 2004. pp. 147-154 .

Record type: Conference or Workshop Item (Paper)

Abstract

This paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final reconstitution as an electronic document (combining content and semantic information) along with the tools that have been created to realise each stage in the lifecycle. The whole approach is described in the context of different types of typewritten documents relating to prisoners in World-War II concentration camps and is the result of a multinational collaboration under the MEMORIAL project funded (€1.5M) by the European Union (www.memorial-project.info). Extensive tests with historians/archivists and evaluation of the content extraction results indicate the superior performance of the whole semantics-driven approach both over manual transcription and over the semi-automated application of off-the-shelf OCR and the use of a conventional (text and layout) document format.

Text
DocEng2004_Antonacopoulos.pdf - Other
Download (518kB)

More information

Published date: 2004
Additional Information: Event Dates: October 28-30, 2004
Venue - Dates: ACM Symposium on Document Engineering (DocEng2004), Milwaukee, Wisconsin, United States, 2004-10-28 - 2004-10-30
Keywords: Historical documents, digital libraries, document engineering, document architecture, text enhancement, document analysis
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 263538
URI: http://eprints.soton.ac.uk/id/eprint/263538
PURE UUID: bfe8d3ea-de24-467c-976e-816e573a11ca

Catalogue record

Date deposited: 19 Feb 2007
Last modified: 14 Mar 2024 07:35

Export record

Contributors

Author: Apostolos Antonacopoulos
Author: Dimosthenis Karatzas
Author: Henryk Krawczyk
Author: Bogdan Wiszniewski

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×