A step towards machine translation between communication symbols and Arabic text
A step towards machine translation between communication symbols and Arabic text
Pictographic symbols may be used as an alternative form of communication by individuals with complex communication needs. These symbols are a collocation of drawings that depict concepts often associated with glosses that express the meaning in spoken language. This research investigates the problem by focusing on one particular language, Modern Standard Arabic, and one set of symbols, the ARASAAC set. The outcomes are generalisable to other symbols sets and spoken languages. Symbols can be used as part of an electronic speech generating devices. Users may use symbols to express messages or to understand received messages. Thus, translating to and from text is an important task that increases communication with a wider community. Translating text to symbols requires awareness of the exact sense of the textual units that are part of the input message, to determine relevant symbols. Translating from symbol to text, on the other hand, requires in addition to associated glosses, an awareness of the grammar, and word sequence likelihoods, to be able generate fully-formed sentences. Machine translation has often been tackled by using methods that require large amounts of data. This data needs to match the domain and cover the same source and target registers that are expected by a translation system. However, a parallel corpus of pictographic symbols and MSA is currently unavailable. This research addresses this issue by proposing an approach that creates the training data needed by making use of existing multilingual textual resources to resolve ambiguity. The outcome was a corpus that had been automatically tagged with morphological annotations and pictographic symbols, the approach followed, and an investigation of the data involved and produced. The availability of this symbol tagged corpus is a step towards Arabic symbol/text translation. This has the potential to enhance communication for those requiring Arabic speech output from symbol messaging, and provide a better understanding of the complexities of automated symbol/text translation processes in Arabic.
University of Southampton
Alzaben, Lama Abdullah
8008eee3-ab14-4f05-b76f-e31f6df3cab8
November 2021
Alzaben, Lama Abdullah
8008eee3-ab14-4f05-b76f-e31f6df3cab8
Wald, Michael
90577cfd-35ae-4e4a-9422-5acffecd89d5
Alzaben, Lama Abdullah
(2021)
A step towards machine translation between communication symbols and Arabic text.
University of Southampton, Doctoral Thesis, 179pp.
Record type:
Thesis
(Doctoral)
Abstract
Pictographic symbols may be used as an alternative form of communication by individuals with complex communication needs. These symbols are a collocation of drawings that depict concepts often associated with glosses that express the meaning in spoken language. This research investigates the problem by focusing on one particular language, Modern Standard Arabic, and one set of symbols, the ARASAAC set. The outcomes are generalisable to other symbols sets and spoken languages. Symbols can be used as part of an electronic speech generating devices. Users may use symbols to express messages or to understand received messages. Thus, translating to and from text is an important task that increases communication with a wider community. Translating text to symbols requires awareness of the exact sense of the textual units that are part of the input message, to determine relevant symbols. Translating from symbol to text, on the other hand, requires in addition to associated glosses, an awareness of the grammar, and word sequence likelihoods, to be able generate fully-formed sentences. Machine translation has often been tackled by using methods that require large amounts of data. This data needs to match the domain and cover the same source and target registers that are expected by a translation system. However, a parallel corpus of pictographic symbols and MSA is currently unavailable. This research addresses this issue by proposing an approach that creates the training data needed by making use of existing multilingual textual resources to resolve ambiguity. The outcome was a corpus that had been automatically tagged with morphological annotations and pictographic symbols, the approach followed, and an investigation of the data involved and produced. The availability of this symbol tagged corpus is a step towards Arabic symbol/text translation. This has the potential to enhance communication for those requiring Arabic speech output from symbol messaging, and provide a better understanding of the complexities of automated symbol/text translation processes in Arabic.
Text
final_thesis _29Nov
- Version of Record
Text
PTD_Thesis_Alzaben-SIGNED
Restricted to Repository staff only
More information
Published date: November 2021
Identifiers
Local EPrints ID: 473769
URI: http://eprints.soton.ac.uk/id/eprint/473769
PURE UUID: 53f5d74c-5c12-4622-99fd-4fbdd6e403a7
Catalogue record
Date deposited: 31 Jan 2023 17:42
Last modified: 17 Mar 2024 07:40
Export record
Contributors
Author:
Lama Abdullah Alzaben
Thesis advisor:
Michael Wald
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics