Semantics-Based Content Extraction in Typewritten Historical Documents


Antonacopoulos, Apostolos and Karatzas, Dimosthenis (2005) Semantics-Based Content Extraction in Typewritten Historical Documents. In, 8th International Conference on Document Analysis and Recognition (ICDAR2005), Seoul, Korea, 29 - 01 Aug 2005. IEEE-CS Press, 48-53.

Download

[img] PDF
Download (523Kb)

Description/Abstract

This paper presents a flexible approach to extracting content from scanned historical documents using semantic information. The final electronic document is the result of a "digital historical document lifecycle" process, where the expert knowledge of the historian/archivist user is incorporated at different stages. Results show that such a conversion strategy aided by (expert) user-specified semantic information and which enables the processing of individual parts of the document in a specialised way, produces superior (in a variety of significant ways) results than document analysis and understanding techniques devised for contemporary documents.

Item Type: Conference or Workshop Item (Paper)
Additional Information: Event Dates: August 29 - Semptember 1, 2005
Keywords: Historical documents, digital libraries, text enchancement, image analysis
Divisions: Faculty of Physical and Applied Science > Electronics and Computer Science
Item ID: 263542
Date Deposited: 19 Feb 2007
Last Modified: 02 Mar 2012 12:20
Contributors: Antonacopoulos, Apostolos (Author)
Karatzas, Dimosthenis (Author)
Date: 2005
Additional Information: Event Dates: August 29 - Semptember 1, 2005
Status: Published
Publisher: IEEE-CS Press
Further Information:Google Scholar
ISI Citation Count:4
URI: http://eprints.soton.ac.uk/id/eprint/263542

Actions (login required)

View Item View Item