The University of Southampton
University of Southampton Institutional Repository

Linguistic development in L2 Spanish: creation and analysis of a learner corpus

Record type: Article

This project had two main aims: to create a small scale, high quality database of spoken learner Spanish, as a new resource for the study of second language acquisition, and to undertake a short programme of substantive research, using the new database.
Spoken Spanish data have been collected from classroom learners in schools and universities in England, using a series of specially designed elicitation tasks, including storytelling, picture description, discussion and individual interview. There were 20 learners at each of 3 levels: beginners (Year 9 students aged 13-14), intermediate students (A2 students aged 17-18), and fourth year undergraduates. All of them were native English speakers. Depending on their level, each learner was audiorecorded undertaking between 3 and 5 oral tasks. They also completed computer based and paper based tasks which provided complementary data on aspects of their Spanish knowledge. For comparison purposes, small numbers of native speakers were also recorded undertaking the same tasks.
The resulting database contains 290 digital soundfiles (240 learner recordings, 50 native speaker recordings). These are accompanied by transcripts in CHILDES (Child Language Data Exchange System) format, which can be analysed with linguistic analysis software (CLAN). Some files have an extra layer of tagging which identifies parts of speech. A project website has been created and the material has been made freely available in anonymised form through the website, for use by other second language acquisition researchers. The website can be viewed at
The substantive research programme undertaken so far has investigated the acquisition of two central features of Spanish grammar which differ from English, i.e. word order in sentences (more fixed in English, more variable in Spanish), and the pronoun system (Spanish object pronouns are marked for gender, and generally precede the verb). The third substantive issue investigated to date is the development of Spanish vocabulary.
The investigation of learners’ developing control of Spanish pronouns has been based on two tasks (picture based production task, and computer based interpretation task). The performance of learners at all three levels has been compared on this issue. The results show that beginner and intermediate learners tend to avoid the use of pronouns, preferring to use full noun phrases in their place. However, once they start to use pronouns, learners’ usage tends to be accurate, and the advanced learners achieve high levels of accuracy (75 per cent in the production task, 90 per cent in the interpretation task). These findings are important for second language acquisition theory, because they are relevant to a central debate regarding the source(s) of learner errors, and the extent to which learners can ever develop a native speaker like grammar system. The findings support the view that learners can and do acquire a correct underlying representation of Spanish grammar, and early errors in their performance are due to other problems such as processing limitations or communicative pressures.
The investigation of learners’ developing control of Spanish word order is relevant to another central discussion in SLA theory. Spanish word order is variable, with subjects both preceding and following verbs, in varying circumstances. The different possibilities are controlled by two interacting factors: a) the grammar associated with particular types of verb, and b) the kind of information being conveyed (broad or narrow focus). Different explanations have been advanced previously for the difficulties learners encounter with this system. Does the problem lie in the underlying grammar system, or do the learners have difficulty in distinguishing between different kinds of information to be conveyed, and how these should influence their decisions about word order? The findings from our study suggest that learners go through a stage of overgeneralising to inappropriate contexts one of the grammatical options available in Spanish, before eventually becoming native-like. The learners did not have ‘information type’ problems, as has sometimes been suggested in the literature.
The investigation of Spanish vocabulary development has been undertaken through analysis of learner performance on an oral interview task, combining picture description and personal conversation. A similar task was used with learners of French recorded in a sister project (details at, and research is being undertaken in parallel on both datasets. So far, results have been published regarding vocabulary development from beginner to intermediate level (Year 9 and Year 13 learners). The results provide information of a type not previously available about language learners in the UK educational system. Learner development in the two languages is strikingly similar, with similar numbers of base words known at the two levels (though Spanish word knowledge was somewhat more diverse). In both languages, substantial gains were made from Year 9 to Year 13, in terms of the numbers of base words known, and also the range of inflections used. Use of part of speech tagging for this subset of the data allowed further analysis of the word classes used at different levels. The Year 9 speakers’ productions in both languages are noticeably noun-heavy, with verbs used proportionally much more by Year 13 speakers. Indicators of more complex language use, such as interrogative and relative pronouns and adverbs, were also more frequent in the Year 13 data. These findings have clear theoretical value (for our understanding of the relationship between vocabulary and grammar development, as well as of vocabulary development itself). They also have clear implications for curriculum design and for effective classroom pedagogy (e.g. the need to focus on development of verb knowledge).
The corpus is being promoted and findings are being disseminated through a series of conference presentations in the UK, Europe and North America. A successful end of project seminar was also organized in Southampton in January 2008, and attracted researchers from the UK, Spain, Portugal and the USA with whom ongoing networking is planned. A successor project begins in August 2008 and will allow further promotion of the corpus as well as continuation and extension of the substantive research programme.

Full text not available from this repository.


Mitchell, R.F., Dominguez, L., Arche, M.J., Myles, F. and Marsden, E. (2008) Linguistic development in L2 Spanish: creation and analysis of a learner corpus Eurosla Yearbook, pp. 287-304.

More information

Published date: 30 June 2008
Keywords: second language acquisition, Spanish, corpus linguistics


Local EPrints ID: 63843
ISSN: 1568-1491
PURE UUID: 36328eed-2b46-439c-8163-a3d00191579d

Catalogue record

Date deposited: 22 Apr 2009
Last modified: 17 Jul 2017 14:15

Export record


Author: R.F. Mitchell
Author: L. Dominguez
Author: M.J. Arche
Author: F. Myles
Author: E. Marsden

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton:

ePrints Soton supports OAI 2.0 with a base URL of

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.