The University of Southampton
University of Southampton Institutional Repository

Semantics-Based Content Extraction in Typewritten Historical Documents

Semantics-Based Content Extraction in Typewritten Historical Documents
Semantics-Based Content Extraction in Typewritten Historical Documents
This paper presents a flexible approach to extracting content from scanned historical documents using semantic information. The final electronic document is the result of a "digital historical document lifecycle" process, where the expert knowledge of the historian/archivist user is incorporated at different stages. Results show that such a conversion strategy aided by (expert) user-specified semantic information and which enables the processing of individual parts of the document in a specialised way, produces superior (in a variety of significant ways) results than document analysis and understanding techniques devised for contemporary documents.
Historical documents, digital libraries, text enchancement, image analysis
48-53
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943

Antonacopoulos, Apostolos and Karatzas, Dimosthenis (2005) Semantics-Based Content Extraction in Typewritten Historical Documents. 8th International Conference on Document Analysis and Recognition (ICDAR2005). pp. 48-53 .

Record type: Conference or Workshop Item (Paper)

Abstract

This paper presents a flexible approach to extracting content from scanned historical documents using semantic information. The final electronic document is the result of a "digital historical document lifecycle" process, where the expert knowledge of the historian/archivist user is incorporated at different stages. Results show that such a conversion strategy aided by (expert) user-specified semantic information and which enables the processing of individual parts of the document in a specialised way, produces superior (in a variety of significant ways) results than document analysis and understanding techniques devised for contemporary documents.

PDF
ICDAR2005_Antonacopoulos.pdf - Other
Download (536kB)

More information

Published date: 2005
Additional Information: Event Dates: August 29 - Semptember 1, 2005
Venue - Dates: 8th International Conference on Document Analysis and Recognition (ICDAR2005), 2005-08-29
Keywords: Historical documents, digital libraries, text enchancement, image analysis
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 263542
URI: https://eprints.soton.ac.uk/id/eprint/263542
PURE UUID: b2754228-5890-414a-9fab-ab4687f9ef34

Catalogue record

Date deposited: 19 Feb 2007
Last modified: 18 Jul 2017 07:44

Export record

Contributors

Author: Apostolos Antonacopoulos
Author: Dimosthenis Karatzas

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×