An Anthropocentric Approach to Text Extraction from WWW Images
An Anthropocentric Approach to Text Extraction from WWW Images
There is a significant need to analyse the text in images on WWW pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper argues that the extraction of text from such images benefits from an anthropocentric approach in the distinction between colour regions. The novelty of the idea is the use of a human perspective of colour perception in preference to RGB colour space analysis. This enables the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background). More precisely, characters are extracted as distinct regions with separate chromaticity and/or luminance by performing a layer decomposition of the image. The method described here is the first in our systematic approach to approximate the human colour perception characteristics for the identification of character regions. In this instance, the image is decomposed by performing histogram analysis of Hue and Luminance and merging in the HLS colour space.
web document analysis, image analysis, text extraction
515-525
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
2000
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
Antonacopoulos, Apostolos and Karatzas, Dimosthenis
(2000)
An Anthropocentric Approach to Text Extraction from WWW Images.
4th IAPR International Workshop on Document Analysis Systems, Rio de Janeiro, Brazil.
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
There is a significant need to analyse the text in images on WWW pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper argues that the extraction of text from such images benefits from an anthropocentric approach in the distinction between colour regions. The novelty of the idea is the use of a human perspective of colour perception in preference to RGB colour space analysis. This enables the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background). More precisely, characters are extracted as distinct regions with separate chromaticity and/or luminance by performing a layer decomposition of the image. The method described here is the first in our systematic approach to approximate the human colour perception characteristics for the identification of character regions. In this instance, the image is decomposed by performing histogram analysis of Hue and Luminance and merging in the HLS colour space.
Text
DAS2000_Antonacopoulos.pdf
- Other
More information
Published date: 2000
Additional Information:
Event Dates: December 2000
Venue - Dates:
4th IAPR International Workshop on Document Analysis Systems, Rio de Janeiro, Brazil, 2000-12-01
Keywords:
web document analysis, image analysis, text extraction
Organisations:
Electronics & Computer Science
Identifiers
Local EPrints ID: 263495
URI: http://eprints.soton.ac.uk/id/eprint/263495
PURE UUID: f5e8ac11-9e7c-44de-a3c5-196c337487e2
Catalogue record
Date deposited: 19 Feb 2007
Last modified: 14 Mar 2024 07:32
Export record
Contributors
Author:
Apostolos Antonacopoulos
Author:
Dimosthenis Karatzas
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics