Implementation Challenges for Nastaliq Character Recognition
Implementation Challenges for Nastaliq Character Recognition
Character recognition in cursive scripts or handwritten Latin script has attracted researchers’ attention recently and some research has been done in this area. Optical character recognition is the translation of optically-scanned bitmaps of printed or written text into digitally editable data files. OCRs developed for many world languages are already in use but none exists for Urdu Nastaliq – a calligraphic adaptation of the Arabic script, just as Jawi is for Malay. Urdu Nastaliq has 39 characters against Arabic 28. Each character then has 2-4 different shapes according to its position in the word: initial, medial, final and isolated. In Nastaliq, inter-word and intra-word overlapping makes optical recognition more complex. Character recognition of the Latin script is relatively easier. This paper reports research on Urdu Nastaliq OCR, discusses challenges and suggest a new solution for its implementation.
Sattar, Sohail A.
2578a10a-8656-41c0-9a86-084d630b8443
Haque, Shamsul
940febb7-01da-44e8-849f-33adedb50cb7
Pathan, Mahmood K.
f46e2e39-9583-4c76-96e1-3a54e5cefa1a
Gee, Quintin
ac0f464c-c192-4806-9c95-f0a866415c16
Sattar, Sohail A.
2578a10a-8656-41c0-9a86-084d630b8443
Haque, Shamsul
940febb7-01da-44e8-849f-33adedb50cb7
Pathan, Mahmood K.
f46e2e39-9583-4c76-96e1-3a54e5cefa1a
Gee, Quintin
ac0f464c-c192-4806-9c95-f0a866415c16
Sattar, Sohail A., Haque, Shamsul, Pathan, Mahmood K. and Gee, Quintin
(2008)
Implementation Challenges for Nastaliq Character Recognition.
International Multi Topic Conference (IMTIC'08), Jamshoro, Sindh, Pakistan.
11 - 12 Apr 2008.
(Submitted)
Record type:
Conference or Workshop Item
(Paper)
Abstract
Character recognition in cursive scripts or handwritten Latin script has attracted researchers’ attention recently and some research has been done in this area. Optical character recognition is the translation of optically-scanned bitmaps of printed or written text into digitally editable data files. OCRs developed for many world languages are already in use but none exists for Urdu Nastaliq – a calligraphic adaptation of the Arabic script, just as Jawi is for Malay. Urdu Nastaliq has 39 characters against Arabic 28. Each character then has 2-4 different shapes according to its position in the word: initial, medial, final and isolated. In Nastaliq, inter-word and intra-word overlapping makes optical recognition more complex. Character recognition of the Latin script is relatively easier. This paper reports research on Urdu Nastaliq OCR, discusses challenges and suggest a new solution for its implementation.
Text
ASattar_85.doc
- Version of Record
More information
Submitted date: July 2008
Additional Information:
Event Dates: 11-12 April 2008
Venue - Dates:
International Multi Topic Conference (IMTIC'08), Jamshoro, Sindh, Pakistan, 2008-04-11 - 2008-04-12
Organisations:
Electronics & Computer Science
Identifiers
Local EPrints ID: 266510
URI: http://eprints.soton.ac.uk/id/eprint/266510
PURE UUID: d8527fe2-7cc2-4dfd-8f2a-9c57f90906f3
Catalogue record
Date deposited: 05 Aug 2008 08:56
Last modified: 14 Mar 2024 08:29
Export record
Contributors
Author:
Sohail A. Sattar
Author:
Shamsul Haque
Author:
Mahmood K. Pathan
Author:
Quintin Gee
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics