A step towards machine translation between communication symbols and Arabic text

Pictographic symbols may be used as an alternative form of communication by individuals with complex communication needs. These symbols are a collocation of drawings that depict concepts often associated with glosses that express the meaning in spoken language. This research investigates the problem by focusing on one particular language, Modern Standard Arabic, and one set of symbols, the ARASAAC set. The outcomes are generalisable to other symbols sets and spoken languages. Symbols can be used as part of an electronic speech generating devices. Users may use symbols to express messages or to understand received messages. Thus, translating to and from text is an important task that increases communication with a wider community. Translating text to symbols requires awareness of the exact sense of the textual units that are part of the input message, to determine relevant symbols. Translating from symbol to text, on the other hand, requires in addition to associated glosses, an awareness of the grammar, and word sequence likelihoods, to be able generate fully-formed sentences. Machine translation has often been tackled by using methods that require large amounts of data. This data needs to match the domain and cover the same source and target registers that are expected by a translation system. However, a parallel corpus of pictographic symbols and MSA is currently unavailable. This research addresses this issue by proposing an approach that creates the training data needed by making use of existing multilingual textual resources to resolve ambiguity. The outcome was a corpus that had been automatically tagged with morphological annotations and pictographic symbols, the approach followed, and an investigation of the data involved and produced. The availability of this symbol tagged corpus is a step towards Arabic symbol/text translation. This has the potential to enhance communication for those requiring Arabic speech output from symbol messaging, and provide a better understanding of the complexities of automated symbol/text translation processes in Arabic.

University of Southampton

Alzaben, Lama Abdullah

8008eee3-ab14-4f05-b76f-e31f6df3cab8

November 2021

Alzaben, Lama Abdullah

8008eee3-ab14-4f05-b76f-e31f6df3cab8

Wald, Michael

90577cfd-35ae-4e4a-9422-5acffecd89d5

Alzaben, Lama Abdullah (2021) A step towards machine translation between communication symbols and Arabic text. University of Southampton, Doctoral Thesis, 179pp.

Record type: Thesis (Doctoral)