The University of Southampton
University of Southampton Institutional Repository

A synthesised word approach to word retrieval in handwritten documents

A synthesised word approach to word retrieval in handwritten documents
A synthesised word approach to word retrieval in handwritten documents
Recent technological advances have enhanced the computer-based indexing and searching of digitised printed books. The performance now achievable in this domain, however, does not at present extend to handwritten texts which inherently contain more significant letter-based variation within their content. Furthermore, in most studies that address the handwritten text retrieval problem, a large training dataset is required which, very often, influences the context and search lexicon. In this paper a novel method is described to overcome the training data problem using a character-based modelling (termed grapheme spectrum) approach and a word modelling technique (termed synthesised word) enabling the retrieval of keywords that have not explicitly been seen in the training set. When tested on an illustrative historical manuscript the performance of the proposed word retrieval technique shows a clear advantage over existing methods.
Handwriting analysis, Digital archives, Handwritten word retrieval, Word spotting, Information retrieval, Handwriting recognition, Historical manuscript analysis
0031-3203
4225-4236
Liang, Y.
e6019ef2-d232-4bce-a224-fa21984a61d8
Fairhurst, M.C.
6a82d154-93fe-4657-bcee-934d5c888192
Guest, Richard
93533dbd-b101-491b-83cc-39ccfdc18165
Liang, Y.
e6019ef2-d232-4bce-a224-fa21984a61d8
Fairhurst, M.C.
6a82d154-93fe-4657-bcee-934d5c888192
Guest, Richard
93533dbd-b101-491b-83cc-39ccfdc18165

Liang, Y., Fairhurst, M.C. and Guest, Richard (2012) A synthesised word approach to word retrieval in handwritten documents. Pattern Recognition, 45 (12), 4225-4236. (doi:10.1016/j.patcog.2012.05.024).

Record type: Article

Abstract

Recent technological advances have enhanced the computer-based indexing and searching of digitised printed books. The performance now achievable in this domain, however, does not at present extend to handwritten texts which inherently contain more significant letter-based variation within their content. Furthermore, in most studies that address the handwritten text retrieval problem, a large training dataset is required which, very often, influences the context and search lexicon. In this paper a novel method is described to overcome the training data problem using a character-based modelling (termed grapheme spectrum) approach and a word modelling technique (termed synthesised word) enabling the retrieval of keywords that have not explicitly been seen in the training set. When tested on an illustrative historical manuscript the performance of the proposed word retrieval technique shows a clear advantage over existing methods.

This record has no associated files available for download.

More information

Accepted/In Press date: 29 May 2012
e-pub ahead of print date: 13 June 2012
Published date: 1 December 2012
Keywords: Handwriting analysis, Digital archives, Handwritten word retrieval, Word spotting, Information retrieval, Handwriting recognition, Historical manuscript analysis

Identifiers

Local EPrints ID: 489651
URI: http://eprints.soton.ac.uk/id/eprint/489651
ISSN: 0031-3203
PURE UUID: dd43d876-663b-4975-892b-ab6b1897e493
ORCID for Richard Guest: ORCID iD orcid.org/0000-0001-7535-7336

Catalogue record

Date deposited: 30 Apr 2024 16:41
Last modified: 01 May 2024 02:10

Export record

Altmetrics

Contributors

Author: Y. Liang
Author: M.C. Fairhurst
Author: Richard Guest ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×