A synthesised word approach to word retrieval in handwritten documents

Recent technological advances have enhanced the computer-based indexing and searching of digitised printed books. The performance now achievable in this domain, however, does not at present extend to handwritten texts which inherently contain more significant letter-based variation within their content. Furthermore, in most studies that address the handwritten text retrieval problem, a large training dataset is required which, very often, influences the context and search lexicon. In this paper a novel method is described to overcome the training data problem using a character-based modelling (termed grapheme spectrum) approach and a word modelling technique (termed synthesised word) enabling the retrieval of keywords that have not explicitly been seen in the training set. When tested on an illustrative historical manuscript the performance of the proposed word retrieval technique shows a clear advantage over existing methods.

Handwriting analysis, Digital archives, Handwritten word retrieval, Word spotting, Information retrieval, Handwriting recognition, Historical manuscript analysis

10.1016/j.patcog.2012.05.024

0031-3203

4225-4236

Liang, Y.

e6019ef2-d232-4bce-a224-fa21984a61d8

Fairhurst, M.C.

6a82d154-93fe-4657-bcee-934d5c888192

Guest, Richard

93533dbd-b101-491b-83cc-39ccfdc18165

1 December 2012

Liang, Y.

e6019ef2-d232-4bce-a224-fa21984a61d8

Fairhurst, M.C.

6a82d154-93fe-4657-bcee-934d5c888192

Guest, Richard

93533dbd-b101-491b-83cc-39ccfdc18165

Liang, Y., Fairhurst, M.C. and Guest, Richard (2012) A synthesised word approach to word retrieval in handwritten documents. Pattern Recognition, 45 (12), 4225-4236. (doi:10.1016/j.patcog.2012.05.024).

Record type: Article

Abstract

This record has no associated files available for download.

More information

Accepted/In Press date: 29 May 2012

e-pub ahead of print date: 13 June 2012

Published date: 1 December 2012

Keywords: Handwriting analysis, Digital archives, Handwritten word retrieval, Word spotting, Information retrieval, Handwriting recognition, Historical manuscript analysis

Learn more about the Cyber Security