The University of Southampton
University of Southampton Institutional Repository

Visual Representation of Text in Web Documents and Its Interpretation

Visual Representation of Text in Web Documents and Its Interpretation
Visual Representation of Text in Web Documents and Its Interpretation
This paper examines the uses of text and its representation on Web documents in terms of the challenges in its interpretation. Particular attention is paid to the significant problem of non-uniform representation of text. This non-uniformity is mainly due to the presence of semantically important text in image form as opposed to the standard encoded text. The issues surrounding text representation in Web documents are discussed in the context of colour perception and spatial representation. The characteristics of the representation of text in image form are examined and research towards interpreting these images of text is briefly described.
Web document analysis, text extraction
181-196
Elsevier
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Malcolm, G.R.
Karatzas, Dimosthenis
4d7e3927-2252-4039-88a4-0daca766e943
Antonacopoulos, Apostolos
9369bee5-b30f-4d4c-a63d-fe54984578cc
Malcolm, G.R.

Karatzas, Dimosthenis and Antonacopoulos, Apostolos (2004) Visual Representation of Text in Web Documents and Its Interpretation. In, Malcolm, G.R. (ed.) Multidisciplinary Approaches to Visual Representations and Interpretations. Elsevier, pp. 181-196.

Record type: Book Section

Abstract

This paper examines the uses of text and its representation on Web documents in terms of the challenges in its interpretation. Particular attention is paid to the significant problem of non-uniform representation of text. This non-uniformity is mainly due to the presence of semantically important text in image form as opposed to the standard encoded text. The issues surrounding text representation in Web documents are discussed in the context of colour perception and spatial representation. The characteristics of the representation of text in image form are examined and research towards interpreting these images of text is briefly described.

Text
VRI2004_Karatzas.pdf - Other
Download (636kB)

More information

Published date: 2004
Keywords: Web document analysis, text extraction
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 263523
URI: http://eprints.soton.ac.uk/id/eprint/263523
PURE UUID: 23f9a033-3704-4c34-b5a1-2f9df962ee05

Catalogue record

Date deposited: 19 Feb 2007
Last modified: 23 Sep 2020 16:44

Export record

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×