Linguistic development in L2 Spanish: creation and analysis of a learner corpus

This project had two main aims: to create a small scale, high quality database of spoken learner Spanish, as a new resource for the study of second language acquisition, and to undertake a short programme of substantive research, using the new database.

Spoken Spanish data have been collected from classroom learners in schools and universities in England, using a series of specially designed elicitation tasks, including storytelling, picture description, discussion and individual interview. There were 20 learners at each of 3 levels: beginners (Year 9 students aged 13-14), intermediate students (A2 students aged 17-18), and fourth year undergraduates. All of them were native English speakers. Depending on their level, each learner was audiorecorded undertaking between 3 and 5 oral tasks. They also completed computer based and paper based tasks which provided complementary data on aspects of their Spanish knowledge. For comparison purposes, small numbers of native speakers were also recorded undertaking the same tasks.

The resulting database contains 290 digital soundfiles (240 learner recordings, 50 native speaker recordings). These are accompanied by transcripts in CHILDES (Child Language Data Exchange System) format, which can be analysed with linguistic analysis software (CLAN). Some files have an extra layer of tagging which identifies parts of speech. A project website has been created and the material has been made freely available in anonymised form through the website, for use by other second language acquisition researchers. The website can be viewed at www.splloc.soton.ac.uk.

The substantive research programme undertaken so far has investigated the acquisition of two central features of Spanish grammar which differ from English, i.e. word order in sentences (more fixed in English, more variable in Spanish), and the pronoun system (Spanish object pronouns are marked for gender, and generally precede the verb). The third substantive issue investigated to date is the development of Spanish vocabulary.

The investigation of learners’ developing control of Spanish pronouns has been based on two tasks (picture based production task, and computer based interpretation task). The performance of learners at all three levels has been compared on this issue. The results show that beginner and intermediate learners tend to avoid the use of pronouns, preferring to use full noun phrases in their place. However, once they start to use pronouns, learners’ usage tends to be accurate, and the advanced learners achieve high levels of accuracy (75 per cent in the production task, 90 per cent in the interpretation task). These findings are important for second language acquisition theory, because they are relevant to a central debate regarding the source(s) of learner errors, and the extent to which learners can ever develop a native speaker like grammar system. The findings support the view that learners can and do acquire a correct underlying representation of Spanish grammar, and early errors in their performance are due to other problems such as processing limitations or communicative pressures.

The investigation of learners’ developing control of Spanish word order is relevant to another central discussion in SLA theory. Spanish word order is variable, with subjects both preceding and following verbs, in varying circumstances. The different possibilities are controlled by two interacting factors: a) the grammar associated with particular types of verb, and b) the kind of information being conveyed (broad or narrow focus). Different explanations have been advanced previously for the difficulties learners encounter with this system. Does the problem lie in the underlying grammar system, or do the learners have difficulty in distinguishing between different kinds of information to be conveyed, and how these should influence their decisions about word order? The findings from our study suggest that learners go through a stage of overgeneralising to inappropriate contexts one of the grammatical options available in Spanish, before eventually becoming native-like. The learners did not have ‘information type’ problems, as has sometimes been suggested in the literature.

The investigation of Spanish vocabulary development has been undertaken through analysis of learner performance on an oral interview task, combining picture description and personal conversation. A similar task was used with learners of French recorded in a sister project (details at www.flloc.soton.ac.uk), and research is being undertaken in parallel on both datasets. So far, results have been published regarding vocabulary development from beginner to intermediate level (Year 9 and Year 13 learners). The results provide information of a type not previously available about language learners in the UK educational system. Learner development in the two languages is strikingly similar, with similar numbers of base words known at the two levels (though Spanish word knowledge was somewhat more diverse). In both languages, substantial gains were made from Year 9 to Year 13, in terms of the numbers of base words known, and also the range of inflections used. Use of part of speech tagging for this subset of the data allowed further analysis of the word classes used at different levels. The Year 9 speakers’ productions in both languages are noticeably noun-heavy, with verbs used proportionally much more by Year 13 speakers. Indicators of more complex language use, such as interrogative and relative pronouns and adverbs, were also more frequent in the Year 13 data. These findings have clear theoretical value (for our understanding of the relationship between vocabulary and grammar development, as well as of vocabulary development itself). They also have clear implications for curriculum design and for effective classroom pedagogy (e.g. the need to focus on development of verb knowledge).

The corpus is being promoted and findings are being disseminated through a series of conference presentations in the UK, Europe and North America. A successful end of project seminar was also organized in Southampton in January 2008, and attracted researchers from the UK, Spain, Portugal and the USA with whom ongoing networking is planned. A successor project begins in August 2008 and will allow further promotion of the corpus as well as continuation and extension of the substantive research programme.

second language acquisition, Spanish, corpus linguistics

Mitchell, R.F.

de2eabed-7903-43fa-961a-c16f69fddd7e

Dominguez, L.

9c1bf2b4-b582-429b-9e8a-5264c4b7e63f

Arche, M.J.

b0bdd219-6c46-4f9e-a91f-0a4156483d51

Myles, F.

dc0c7dcb-a976-4b86-93de-5120ac60f995

Marsden, E.

bfb14cb6-e12a-4fe3-b0bf-a86c4b925860

30 June 2008