Semantics-Based Content Extraction in Typewritten Historical Documents
Semantics-Based Content Extraction in Typewritten Historical Documents
This paper presents a flexible approach to extracting content from scanned historical documents using semantic information. The final electronic document is the result of a "digital historical document lifecycle" process, where the expert knowledge of the historian/archivist user is incorporated at different stages. Results show that such a conversion strategy aided by (expert) user-specified semantic information and which enables the processing of individual parts of the document in a specialised way, produces superior (in a variety of significant ways) results than document analysis and understanding techniques devised for contemporary documents.
Historical documents, digital libraries, text enchancement, image analysis
48-53
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
2005
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
Antonacopoulos, Apostolos and Karatzas, Dimosthenis
(2005)
Semantics-Based Content Extraction in Typewritten Historical Documents.
8th International Conference on Document Analysis and Recognition (ICDAR2005), Seoul, Korea.
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
This paper presents a flexible approach to extracting content from scanned historical documents using semantic information. The final electronic document is the result of a "digital historical document lifecycle" process, where the expert knowledge of the historian/archivist user is incorporated at different stages. Results show that such a conversion strategy aided by (expert) user-specified semantic information and which enables the processing of individual parts of the document in a specialised way, produces superior (in a variety of significant ways) results than document analysis and understanding techniques devised for contemporary documents.
Text
ICDAR2005_Antonacopoulos.pdf
- Other
More information
Published date: 2005
Additional Information:
Event Dates: August 29 - Semptember 1, 2005
Venue - Dates:
8th International Conference on Document Analysis and Recognition (ICDAR2005), Seoul, Korea, 2005-08-29
Keywords:
Historical documents, digital libraries, text enchancement, image analysis
Organisations:
Electronics & Computer Science
Identifiers
Local EPrints ID: 263542
URI: http://eprints.soton.ac.uk/id/eprint/263542
PURE UUID: b2754228-5890-414a-9fab-ab4687f9ef34
Catalogue record
Date deposited: 19 Feb 2007
Last modified: 14 Mar 2024 07:34
Export record
Contributors
Author:
Apostolos Antonacopoulos
Author:
Dimosthenis Karatzas
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics