The University of Southampton
University of Southampton Institutional Repository

Two Approaches for Text Segmentation in Web Images

Two Approaches for Text Segmentation in Web Images
Two Approaches for Text Segmentation in Web Images
There is a significant need to recognise the text in images on web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents and compares two novel methods for the segmentation of characters for subsequent extraction and recognition. The novelty of both approaches is the combination of (different in each case) topological features of characters with an anthropocentric perspective of colour perception— in preference to RGB space analysis. Both approaches enable the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background).
Text segmentation, web document analysis, text extraction, fuzzy, image analysis, colour
131-136
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc

Karatzas, Dimosthenis and Antonacopoulos, Apostolos (2003) Two Approaches for Text Segmentation in Web Images. 7th International Conference on Document Analysis and Recognition (ICDAR2003), Edinburgh. pp. 131-136 .

Record type: Conference or Workshop Item (Paper)

Abstract

There is a significant need to recognise the text in images on web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents and compares two novel methods for the segmentation of characters for subsequent extraction and recognition. The novelty of both approaches is the combination of (different in each case) topological features of characters with an anthropocentric perspective of colour perception— in preference to RGB space analysis. Both approaches enable the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background).

Text
ICDAR2003_Karatzas.pdf - Other
Download (504kB)

More information

Published date: 2003
Additional Information: Event Dates: August 2003
Venue - Dates: 7th International Conference on Document Analysis and Recognition (ICDAR2003), Edinburgh, 2003-08-01
Keywords: Text segmentation, web document analysis, text extraction, fuzzy, image analysis, colour
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 263516
URI: http://eprints.soton.ac.uk/id/eprint/263516
PURE UUID: cd8a2662-955b-44ca-ad96-ecc6aecdaf05

Catalogue record

Date deposited: 19 Feb 2007
Last modified: 14 Mar 2024 07:33

Export record

Contributors

Author: Dimosthenis Karatzas
Author: Apostolos Antonacopoulos

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×