Accessing Textual Information Embedded in Internet Images
Antonacopoulos, Apostolos, Karatzas, Dimosthenis and Ortiz Lopez, J (2001) Accessing Textual Information Embedded in Internet Images. In, SPIE, Internet Imaging II, San Jose, USA, SPIE, 198-205.
Download
|
PDF
Download (122Kb) |
Description/Abstract
Indexing and searching for WWW pages is relying on analysing text. Current technology cannot process the text embedded in images on WWW pages. This paper argues that this is a significant problem as text in image form is usually semantically important (e.g. headers, titles). The results of a recent study are presented to show that the majority (76%) of words embedded in images do not appear elsewhere in the main text and that the majority (56%) of ALT tag descriptions of images are incorrect or do not exist at all. Research under way to devise tools to extract text from images based on the way humans perceive colour differences is outlined and results are presented.
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Additional Information: | Event Dates: January 2001 |
| Keywords: | Web document analysis, image analysis, text extraction |
| Divisions: | Faculty of Physical and Applied Science > Electronics and Computer Science |
| Item ID: | 263506 |
| Date Deposited: | 19 Feb 2007 |
| Last Modified: | 15 Aug 2012 03:16 |
| Contributors: | Antonacopoulos, Apostolos (Author) Karatzas, Dimosthenis (Author) Ortiz Lopez, J (Author) |
| Date: | 2001 |
| Additional Information: | Event Dates: January 2001 |
| Status: | Published |
| Publisher: | SPIE |
| Further Information: | Google Scholar |
| ISI Citation Count: | 5 |
| URI: | http://eprints.soton.ac.uk/id/eprint/263506 |
Actions (login required)
![]() |
View Item |


