A synthesised word approach to word retrieval in handwritten documents
A synthesised word approach to word retrieval in handwritten documents
 
  Recent technological advances have enhanced the computer-based indexing and searching of digitised printed books. The performance now achievable in this domain, however, does not at present extend to handwritten texts which inherently contain more significant letter-based variation within their content. Furthermore, in most studies that address the handwritten text retrieval problem, a large training dataset is required which, very often, influences the context and search lexicon. In this paper a novel method is described to overcome the training data problem using a character-based modelling (termed grapheme spectrum) approach and a word modelling technique (termed synthesised word) enabling the retrieval of keywords that have not explicitly been seen in the training set. When tested on an illustrative historical manuscript the performance of the proposed word retrieval technique shows a clear advantage over existing methods.
  Handwriting analysis, Digital archives, Handwritten word retrieval, Word spotting, Information retrieval, Handwriting recognition, Historical manuscript analysis
  
  
  4225-4236
  
    
      Liang, Y.
      
        e6019ef2-d232-4bce-a224-fa21984a61d8
      
     
  
    
      Fairhurst, M.C.
      
        6a82d154-93fe-4657-bcee-934d5c888192
      
     
  
    
      Guest, Richard
      
        93533dbd-b101-491b-83cc-39ccfdc18165
      
     
  
  
   
  
  
    
    
  
    
    
  
    
      1 December 2012
    
    
  
  
    
      Liang, Y.
      
        e6019ef2-d232-4bce-a224-fa21984a61d8
      
     
  
    
      Fairhurst, M.C.
      
        6a82d154-93fe-4657-bcee-934d5c888192
      
     
  
    
      Guest, Richard
      
        93533dbd-b101-491b-83cc-39ccfdc18165
      
     
  
       
    
 
  
    
      
  
  
  
  
  
  
    Liang, Y., Fairhurst, M.C. and Guest, Richard
  
  
  
  
   
    (2012)
  
  
    
    A synthesised word approach to word retrieval in handwritten documents.
  
  
  
  
    Pattern Recognition, 45 (12), .
  
   (doi:10.1016/j.patcog.2012.05.024). 
  
  
   
  
  
  
  
  
   
  
    
      
        
          Abstract
          Recent technological advances have enhanced the computer-based indexing and searching of digitised printed books. The performance now achievable in this domain, however, does not at present extend to handwritten texts which inherently contain more significant letter-based variation within their content. Furthermore, in most studies that address the handwritten text retrieval problem, a large training dataset is required which, very often, influences the context and search lexicon. In this paper a novel method is described to overcome the training data problem using a character-based modelling (termed grapheme spectrum) approach and a word modelling technique (termed synthesised word) enabling the retrieval of keywords that have not explicitly been seen in the training set. When tested on an illustrative historical manuscript the performance of the proposed word retrieval technique shows a clear advantage over existing methods.
        
        This record has no associated files available for download.
       
    
    
   
  
  
  More information
  
    
      Accepted/In Press date: 29 May 2012
 
    
      e-pub ahead of print date: 13 June 2012
 
    
      Published date: 1 December 2012
 
    
  
  
    
  
    
  
    
  
    
  
    
  
    
     
        Keywords:
        Handwriting analysis, Digital archives, Handwritten word retrieval, Word spotting, Information retrieval, Handwriting recognition, Historical manuscript analysis
      
    
  
    
  
    
  
  
        Identifiers
        Local EPrints ID: 489651
        URI: http://eprints.soton.ac.uk/id/eprint/489651
        
          
        
        
        
          ISSN: 0031-3203
        
        
          PURE UUID: dd43d876-663b-4975-892b-ab6b1897e493
        
  
    
        
          
        
    
        
          
        
    
        
          
            
              
            
          
        
    
  
  Catalogue record
  Date deposited: 30 Apr 2024 16:41
  Last modified: 01 May 2024 02:10
  Export record
  
  
   Altmetrics
   
   
  
 
 
  
    
    
      Contributors
      
          
          Author:
          
            
            
              Y. Liang
            
          
        
      
          
          Author:
          
            
            
              M.C. Fairhurst
            
          
        
      
          
          Author:
          
            
              
              
                Richard Guest
              
              
                 
              
            
            
          
         
      
      
      
    
  
   
  
    Download statistics
    
      Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
      
      View more statistics