The University of Southampton
University of Southampton Institutional Repository

A step towards machine translation between communication symbols and Arabic text

A step towards machine translation between communication symbols and Arabic text
A step towards machine translation between communication symbols and Arabic text
Pictographic symbols may be used as an alternative form of communication by individuals with complex communication needs. These symbols are a collocation of drawings that depict concepts often associated with glosses that express the meaning in spoken language. This research investigates the problem by focusing on one particular language, Modern Standard Arabic, and one set of symbols, the ARASAAC set. The outcomes are generalisable to other symbols sets and spoken languages. Symbols can be used as part of an electronic speech generating devices. Users may use symbols to express messages or to understand received messages. Thus, translating to and from text is an important task that increases communication with a wider community. Translating text to symbols requires awareness of the exact sense of the textual units that are part of the input message, to determine relevant symbols. Translating from symbol to text, on the other hand, requires in addition to associated glosses, an awareness of the grammar, and word sequence likelihoods, to be able generate fully-formed sentences. Machine translation has often been tackled by using methods that require large amounts of data. This data needs to match the domain and cover the same source and target registers that are expected by a translation system. However, a parallel corpus of pictographic symbols and MSA is currently unavailable. This research addresses this issue by proposing an approach that creates the training data needed by making use of existing multilingual textual resources to resolve ambiguity. The outcome was a corpus that had been automatically tagged with morphological annotations and pictographic symbols, the approach followed, and an investigation of the data involved and produced. The availability of this symbol tagged corpus is a step towards Arabic symbol/text translation. This has the potential to enhance communication for those requiring Arabic speech output from symbol messaging, and provide a better understanding of the complexities of automated symbol/text translation processes in Arabic.
University of Southampton
Alzaben, Lama Abdullah
8008eee3-ab14-4f05-b76f-e31f6df3cab8
Alzaben, Lama Abdullah
8008eee3-ab14-4f05-b76f-e31f6df3cab8
Wald, Michael
90577cfd-35ae-4e4a-9422-5acffecd89d5

Alzaben, Lama Abdullah (2021) A step towards machine translation between communication symbols and Arabic text. University of Southampton, Doctoral Thesis, 179pp.

Record type: Thesis (Doctoral)

Abstract

Pictographic symbols may be used as an alternative form of communication by individuals with complex communication needs. These symbols are a collocation of drawings that depict concepts often associated with glosses that express the meaning in spoken language. This research investigates the problem by focusing on one particular language, Modern Standard Arabic, and one set of symbols, the ARASAAC set. The outcomes are generalisable to other symbols sets and spoken languages. Symbols can be used as part of an electronic speech generating devices. Users may use symbols to express messages or to understand received messages. Thus, translating to and from text is an important task that increases communication with a wider community. Translating text to symbols requires awareness of the exact sense of the textual units that are part of the input message, to determine relevant symbols. Translating from symbol to text, on the other hand, requires in addition to associated glosses, an awareness of the grammar, and word sequence likelihoods, to be able generate fully-formed sentences. Machine translation has often been tackled by using methods that require large amounts of data. This data needs to match the domain and cover the same source and target registers that are expected by a translation system. However, a parallel corpus of pictographic symbols and MSA is currently unavailable. This research addresses this issue by proposing an approach that creates the training data needed by making use of existing multilingual textual resources to resolve ambiguity. The outcome was a corpus that had been automatically tagged with morphological annotations and pictographic symbols, the approach followed, and an investigation of the data involved and produced. The availability of this symbol tagged corpus is a step towards Arabic symbol/text translation. This has the potential to enhance communication for those requiring Arabic speech output from symbol messaging, and provide a better understanding of the complexities of automated symbol/text translation processes in Arabic.

Text
final_thesis _29Nov - Version of Record
Available under License University of Southampton Thesis Licence.
Download (7MB)
Text
PTD_Thesis_Alzaben-SIGNED
Restricted to Repository staff only

More information

Published date: November 2021

Identifiers

Local EPrints ID: 473769
URI: http://eprints.soton.ac.uk/id/eprint/473769
PURE UUID: 53f5d74c-5c12-4622-99fd-4fbdd6e403a7
ORCID for Lama Abdullah Alzaben: ORCID iD orcid.org/0000-0001-9521-8093

Catalogue record

Date deposited: 31 Jan 2023 17:42
Last modified: 17 Mar 2024 07:40

Export record

Contributors

Author: Lama Abdullah Alzaben ORCID iD
Thesis advisor: Michael Wald

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×