The University of Southampton
University of Southampton Institutional Repository

A language-independent approach to automatic text difficulty assessment for second-language learners

A language-independent approach to automatic text difficulty assessment for second-language learners
A language-independent approach to automatic text difficulty assessment for second-language learners
In this paper, we introduce a new baseline for language-independent text difficulty assessment applied to the Interagency Language Roundtable (ILR) proficiency scale. We demonstrate that reading level assessment is a discriminative problem that is best-suited for regression. Our baseline uses z-normalized shallow length features and TF-LOG weighted vectors on bag-of-words for Arabic, Dari, English, and Pashto. We compare Support Vector Machines and the Margin-Infused Relaxed Algorithm measured by mean squared error. We provide an analysis of which features are most predictive of a given level.
Association for Computational Linguistics
Shen, Wade
f57346e2-187e-4a27-b153-f77006128f32
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Marius, Tamas
e63a1673-e928-4122-b245-60064a92782f
Salesky, Elisabeth
169fcc87-7fb8-47a4-a162-1eb68b0bf039
Shen, Wade
f57346e2-187e-4a27-b153-f77006128f32
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Marius, Tamas
e63a1673-e928-4122-b245-60064a92782f
Salesky, Elisabeth
169fcc87-7fb8-47a4-a162-1eb68b0bf039

Shen, Wade, Williams, Jennifer, Marius, Tamas and Salesky, Elisabeth (2013) A language-independent approach to automatic text difficulty assessment for second-language learners. In Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations. Association for Computational Linguistics. 9 pp .

Record type: Conference or Workshop Item (Paper)

Abstract

In this paper, we introduce a new baseline for language-independent text difficulty assessment applied to the Interagency Language Roundtable (ILR) proficiency scale. We demonstrate that reading level assessment is a discriminative problem that is best-suited for regression. Our baseline uses z-normalized shallow length features and TF-LOG weighted vectors on bag-of-words for Arabic, Dari, English, and Pashto. We compare Support Vector Machines and the Margin-Infused Relaxed Algorithm measured by mean squared error. We provide an analysis of which features are most predictive of a given level.

This record has no associated files available for download.

More information

Published date: 9 August 2013

Identifiers

Local EPrints ID: 470362
URI: http://eprints.soton.ac.uk/id/eprint/470362
PURE UUID: 0851360f-f9a9-44d4-b59d-eae4ba50dffa
ORCID for Jennifer Williams: ORCID iD orcid.org/0000-0003-1410-0427

Catalogue record

Date deposited: 07 Oct 2022 16:30
Last modified: 17 Mar 2024 04:12

Export record

Contributors

Author: Wade Shen
Author: Jennifer Williams ORCID iD
Author: Tamas Marius
Author: Elisabeth Salesky

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×