A language-independent approach to automatic text difficulty assessment for second-language learners
A language-independent approach to automatic text difficulty assessment for second-language learners
In this paper, we introduce a new baseline for language-independent text difficulty assessment applied to the Interagency Language Roundtable (ILR) proficiency scale. We demonstrate that reading level assessment is a discriminative problem that is best-suited for regression. Our baseline uses z-normalized shallow length features and TF-LOG weighted vectors on bag-of-words for Arabic, Dari, English, and Pashto. We compare Support Vector Machines and the Margin-Infused Relaxed Algorithm measured by mean squared error. We provide an analysis of which features are most predictive of a given level.
Association for Computational Linguistics (ACL)
Shen, Wade
f57346e2-187e-4a27-b153-f77006128f32
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Marius, Tamas
e63a1673-e928-4122-b245-60064a92782f
Salesky, Elisabeth
169fcc87-7fb8-47a4-a162-1eb68b0bf039
9 August 2013
Shen, Wade
f57346e2-187e-4a27-b153-f77006128f32
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Marius, Tamas
e63a1673-e928-4122-b245-60064a92782f
Salesky, Elisabeth
169fcc87-7fb8-47a4-a162-1eb68b0bf039
Shen, Wade, Williams, Jennifer, Marius, Tamas and Salesky, Elisabeth
(2013)
A language-independent approach to automatic text difficulty assessment for second-language learners.
In Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations.
Association for Computational Linguistics (ACL).
9 pp
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
In this paper, we introduce a new baseline for language-independent text difficulty assessment applied to the Interagency Language Roundtable (ILR) proficiency scale. We demonstrate that reading level assessment is a discriminative problem that is best-suited for regression. Our baseline uses z-normalized shallow length features and TF-LOG weighted vectors on bag-of-words for Arabic, Dari, English, and Pashto. We compare Support Vector Machines and the Margin-Infused Relaxed Algorithm measured by mean squared error. We provide an analysis of which features are most predictive of a given level.
This record has no associated files available for download.
More information
Published date: 9 August 2013
Identifiers
Local EPrints ID: 470362
URI: http://eprints.soton.ac.uk/id/eprint/470362
PURE UUID: 0851360f-f9a9-44d4-b59d-eae4ba50dffa
Catalogue record
Date deposited: 07 Oct 2022 16:30
Last modified: 20 Jul 2024 02:07
Export record
Contributors
Author:
Wade Shen
Author:
Jennifer Williams
Author:
Tamas Marius
Author:
Elisabeth Salesky
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics