The University of Southampton
University of Southampton Institutional Repository

An Anthropocentric Approach to Text Extraction from WWW Images

An Anthropocentric Approach to Text Extraction from WWW Images
An Anthropocentric Approach to Text Extraction from WWW Images
There is a significant need to analyse the text in images on WWW pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper argues that the extraction of text from such images benefits from an anthropocentric approach in the distinction between colour regions. The novelty of the idea is the use of a human perspective of colour perception in preference to RGB colour space analysis. This enables the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background). More precisely, characters are extracted as distinct regions with separate chromaticity and/or luminance by performing a layer decomposition of the image. The method described here is the first in our systematic approach to approximate the human colour perception characteristics for the identification of character regions. In this instance, the image is decomposed by performing histogram analysis of Hue and Luminance and merging in the HLS colour space.
web document analysis, image analysis, text extraction
515-525
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943

Antonacopoulos, Apostolos and Karatzas, Dimosthenis (2000) An Anthropocentric Approach to Text Extraction from WWW Images. 4th IAPR International Workshop on Document Analysis Systems, Rio de Janeiro, Brazil. pp. 515-525 .

Record type: Conference or Workshop Item (Paper)

Abstract

There is a significant need to analyse the text in images on WWW pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper argues that the extraction of text from such images benefits from an anthropocentric approach in the distinction between colour regions. The novelty of the idea is the use of a human perspective of colour perception in preference to RGB colour space analysis. This enables the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background). More precisely, characters are extracted as distinct regions with separate chromaticity and/or luminance by performing a layer decomposition of the image. The method described here is the first in our systematic approach to approximate the human colour perception characteristics for the identification of character regions. In this instance, the image is decomposed by performing histogram analysis of Hue and Luminance and merging in the HLS colour space.

Text
DAS2000_Antonacopoulos.pdf - Other
Download (83kB)

More information

Published date: 2000
Additional Information: Event Dates: December 2000
Venue - Dates: 4th IAPR International Workshop on Document Analysis Systems, Rio de Janeiro, Brazil, 2000-12-01
Keywords: web document analysis, image analysis, text extraction
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 263495
URI: http://eprints.soton.ac.uk/id/eprint/263495
PURE UUID: f5e8ac11-9e7c-44de-a3c5-196c337487e2

Catalogue record

Date deposited: 19 Feb 2007
Last modified: 14 Mar 2024 07:32

Export record

Contributors

Author: Apostolos Antonacopoulos
Author: Dimosthenis Karatzas

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×