Text Extraction from Web Images Based on Human Perception and Fuzzy Inference
Text Extraction from Web Images Based on Human Perception and Fuzzy Inference
There is a significant need to extract and recognise the semantically-important text contained in images on Web pages. This paper proposes a new approach to text extraction from this special class of images. The method attempts to emulate closer than before the way humans perceive colour differences in order to differentiate between text and background regions. Pixels of similar colour (as humans see it) are merged into components and a fuzzy inference mechanism (using connectivity and colour distance features) is devised to group components into larger character-like regions.
35-38
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
2001
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
Antonacopoulos, Apostolos and Karatzas, Dimosthenis
(2001)
Text Extraction from Web Images Based on Human Perception and Fuzzy Inference.
First International Workshop on Web Document Analysis (WDA2001), Seattle, United States.
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
There is a significant need to extract and recognise the semantically-important text contained in images on Web pages. This paper proposes a new approach to text extraction from this special class of images. The method attempts to emulate closer than before the way humans perceive colour differences in order to differentiate between text and background regions. Pixels of similar colour (as humans see it) are merged into components and a fuzzy inference mechanism (using connectivity and colour distance features) is devised to group components into larger character-like regions.
Text
WDA2001_Antonacopoulos.pdf
- Other
More information
Published date: 2001
Additional Information:
Event Dates: September 2001
Venue - Dates:
First International Workshop on Web Document Analysis (WDA2001), Seattle, United States, 2001-09-01
Organisations:
Electronics & Computer Science
Identifiers
Local EPrints ID: 263510
URI: http://eprints.soton.ac.uk/id/eprint/263510
PURE UUID: 631c80f4-ab8d-48cf-83fc-abacceed7a77
Catalogue record
Date deposited: 19 Feb 2007
Last modified: 14 Mar 2024 07:33
Export record
Contributors
Author:
Apostolos Antonacopoulos
Author:
Dimosthenis Karatzas
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics