An Arabic Sign Language Corpus for Instructional Language in School

Machine translation (MT) technology has made significant progress over the last decade and now offers the potential for Arabic sign language (ArSL) signers to access text published in Arabic. The dominant model of MT is now corpus based. In this model, the accuracy of translation correlates directly with size and coverage of the corpus. The corpus is a collection of translation examples constructed from existing documents such as books and newspapers; however, no written system for sign language (SL) comparable to that used for natural language has yet been developed. Hence, no SL documents exist, complicating the procedure for constructing an SL corpus. In countries such as Ireland and Germany, a number of corpora have already been developed from scratch and used for MT. There is no ArSL corpus for MT, requiring the creation of a new ArSL corpus for language instruction. The goal of building this corpus is to develop an automatic translation system from Arabic text to ArSL. This paper presents the ArSL corpus for instructional language constructed for use in schools, and the methodology used to create it. The corpus was collected at the College of Computer and Information Sciences at Imam Muhammad bin Saud University in Riyadh, Saudi Arabia. A group of interpreters and native signers with backgrounds in education were involved in this work. The corpus was constructed by collecting instructional sentences used daily in schools for the deaf. The syntax and morphology of each sentence were then manually analysed. Each sentence was individually translated, recorded on video, and stored in MPEG format. The corpus contains video data from three native signers. The videos were then annotated using an ELAN annotation tool. The annotated video data contain isolated signs accompanied by detailed information, such as manual and non-manual features. The last procedure in constructing the corpus was to create a bilingual dictionary from the annotated videos. The corpus comprises two main parts. The first part is the annotated video data, comprising isolated signs with detailed information, accompanied by manual and non-manual features. It also contains the Arabic translation script, including syntax and morphology details. The second part is the bilingual dictionary, delivered with the annotated videos.

2-9517408-6-7

81-82

Almohimeed, Abdulaziz

926b035d-9396-4091-a6cc-8139ebe6b1c0

Wald, Mike

90577cfd-35ae-4e4a-9422-5acffecd89d5

Damper, Robert

6e0e7fdc-57ec-44d4-bc0f-029d17ba441d

23 May 2010

Almohimeed, Abdulaziz

926b035d-9396-4091-a6cc-8139ebe6b1c0

Wald, Mike

90577cfd-35ae-4e4a-9422-5acffecd89d5

Damper, Robert

6e0e7fdc-57ec-44d4-bc0f-029d17ba441d

Almohimeed, Abdulaziz, Wald, Mike and Damper, Robert (2010) An Arabic Sign Language Corpus for Instructional Language in School. LREC 2010: 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, Malta. 17 - 22 May 2010. pp. 81-82 .

Record type: Conference or Workshop Item (Paper)

Abstract

Text

paper.pdf - Version of Record

Download (4MB)

More information

Published date: 23 May 2010

Additional Information: Event Dates: 17-22 May 2010

Venue - Dates: LREC 2010: 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, Malta, 2010-05-17 - 2010-05-22

Organisations: Web & Internet Science, Southampton Wireless Group

Learn more about the Southampton Wireless Group

Identifiers

Local EPrints ID: 271106

URI: http://eprints.soton.ac.uk/id/eprint/271106

ISBN: 2-9517408-6-7

PURE UUID: ad96f7f0-811c-4e4f-a2c7-0b0c8d11322b

Catalogue record

Date deposited: 29 May 2010 11:25

Last modified: 14 Mar 2024 09:23

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Abdulaziz Almohimeed

Author: Mike Wald

Author: Robert Damper

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information