The University of Southampton
University of Southampton Institutional Repository

An Arabic Sign Language Corpus for Instructional Language in School

An Arabic Sign Language Corpus for Instructional Language in School
An Arabic Sign Language Corpus for Instructional Language in School
Machine translation (MT) technology has made significant progress over the last decade and now offers the potential for Arabic sign language (ArSL) signers to access text published in Arabic. The dominant model of MT is now corpus based. In this model, the accuracy of translation correlates directly with size and coverage of the corpus. The corpus is a collection of translation examples constructed from existing documents such as books and newspapers; however, no written system for sign language (SL) comparable to that used for natural language has yet been developed. Hence, no SL documents exist, complicating the procedure for constructing an SL corpus. In countries such as Ireland and Germany, a number of corpora have already been developed from scratch and used for MT. There is no ArSL corpus for MT, requiring the creation of a new ArSL corpus for language instruction. The goal of building this corpus is to develop an automatic translation system from Arabic text to ArSL. This paper presents the ArSL corpus for instructional language constructed for use in schools, and the methodology used to create it. The corpus was collected at the College of Computer and Information Sciences at Imam Muhammad bin Saud University in Riyadh, Saudi Arabia. A group of interpreters and native signers with backgrounds in education were involved in this work. The corpus was constructed by collecting instructional sentences used daily in schools for the deaf. The syntax and morphology of each sentence were then manually analysed. Each sentence was individually translated, recorded on video, and stored in MPEG format. The corpus contains video data from three native signers. The videos were then annotated using an ELAN annotation tool. The annotated video data contain isolated signs accompanied by detailed information, such as manual and non-manual features. The last procedure in constructing the corpus was to create a bilingual dictionary from the annotated videos. The corpus comprises two main parts. The first part is the annotated video data, comprising isolated signs with detailed information, accompanied by manual and non-manual features. It also contains the Arabic translation script, including syntax and morphology details. The second part is the bilingual dictionary, delivered with the annotated videos.
2-9517408-6-7
81-82
Almohimeed, Abdulaziz
926b035d-9396-4091-a6cc-8139ebe6b1c0
Wald, Mike
90577cfd-35ae-4e4a-9422-5acffecd89d5
Damper, Robert
6e0e7fdc-57ec-44d4-bc0f-029d17ba441d
Almohimeed, Abdulaziz
926b035d-9396-4091-a6cc-8139ebe6b1c0
Wald, Mike
90577cfd-35ae-4e4a-9422-5acffecd89d5
Damper, Robert
6e0e7fdc-57ec-44d4-bc0f-029d17ba441d

Almohimeed, Abdulaziz, Wald, Mike and Damper, Robert (2010) An Arabic Sign Language Corpus for Instructional Language in School. LREC 2010: 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, Malta. 17 - 22 May 2010. pp. 81-82 .

Record type: Conference or Workshop Item (Paper)

Abstract

Machine translation (MT) technology has made significant progress over the last decade and now offers the potential for Arabic sign language (ArSL) signers to access text published in Arabic. The dominant model of MT is now corpus based. In this model, the accuracy of translation correlates directly with size and coverage of the corpus. The corpus is a collection of translation examples constructed from existing documents such as books and newspapers; however, no written system for sign language (SL) comparable to that used for natural language has yet been developed. Hence, no SL documents exist, complicating the procedure for constructing an SL corpus. In countries such as Ireland and Germany, a number of corpora have already been developed from scratch and used for MT. There is no ArSL corpus for MT, requiring the creation of a new ArSL corpus for language instruction. The goal of building this corpus is to develop an automatic translation system from Arabic text to ArSL. This paper presents the ArSL corpus for instructional language constructed for use in schools, and the methodology used to create it. The corpus was collected at the College of Computer and Information Sciences at Imam Muhammad bin Saud University in Riyadh, Saudi Arabia. A group of interpreters and native signers with backgrounds in education were involved in this work. The corpus was constructed by collecting instructional sentences used daily in schools for the deaf. The syntax and morphology of each sentence were then manually analysed. Each sentence was individually translated, recorded on video, and stored in MPEG format. The corpus contains video data from three native signers. The videos were then annotated using an ELAN annotation tool. The annotated video data contain isolated signs accompanied by detailed information, such as manual and non-manual features. The last procedure in constructing the corpus was to create a bilingual dictionary from the annotated videos. The corpus comprises two main parts. The first part is the annotated video data, comprising isolated signs with detailed information, accompanied by manual and non-manual features. It also contains the Arabic translation script, including syntax and morphology details. The second part is the bilingual dictionary, delivered with the annotated videos.

PDF
paper.pdf - Version of Record
Download (4MB)

More information

Published date: 23 May 2010
Additional Information: Event Dates: 17-22 May 2010
Venue - Dates: LREC 2010: 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, Malta, 2010-05-17 - 2010-05-22
Organisations: Web & Internet Science, Southampton Wireless Group

Identifiers

Local EPrints ID: 271106
URI: https://eprints.soton.ac.uk/id/eprint/271106
ISBN: 2-9517408-6-7
PURE UUID: ad96f7f0-811c-4e4f-a2c7-0b0c8d11322b

Catalogue record

Date deposited: 29 May 2010 11:25
Last modified: 18 Jul 2017 06:46

Export record

Contributors

Author: Abdulaziz Almohimeed
Author: Mike Wald
Author: Robert Damper

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×