The impact of rater experience and essay quality on raters’ decision-making behaviors

Sahan, Özgür (2019) The impact of rater experience and essay quality on raters’ decision-making behaviors. American Association for Applied Linguistics (AAAL) Conference, Sheraton Hotel , Atlanta, United States. 09 - 12 Mar 2019.

Record type: Conference or Workshop Item (Paper)

Abstract

Because assessing EFL students’ writing is a subjective process, detailed scoring rubrics are often used to help minimize rater inconsistency and produce reliable scores. However, focusing exclusively on the outcomes of the assessment task might obscure the processes through which raters go to arrive at their decisions. In this sense, think-aloud protocols can be considered an important tool to understand better the thought processes of raters. This study examines the decision-making behaviors of raters with varying levels of experience while assessing EFL essays of distinct qualities. The data were collected from 28 raters who were Turkish nationals and full-time employees at the English language departments of different state universities in Turkey. They were divided into three experience groups (low, n = 11; medium, n = 8; high, n = 9) based on their reported experience assessing EFL writing. Using a 10-point analytic rubric, each rater voice-recorded their thoughts while scoring 16 essays of distinct text qualities. The data were transcribed and coded using a coding scheme adapted from Cumming, Kantor, and Powers (2002). The results revealed that raters used more interpretation strategies than judgment strategies in their assessments and employed more self-monitoring strategies than language-focused or rhetorical and ideational-focused strategies. Moreover, raters prioritized aspects of style, grammar and mechanics when rating low-quality essays but emphasized rhetoric and their general impressions of the text for high-quality essays. Furthermore, the medium- and high-experienced groups displayed more similarities in their decision-making behaviors than the low-experienced group. The findings suggest that raters’ scoring behaviors might evolve as they become more experienced and that more experienced raters might rely more on their own judgements and less on the scoring criteria. As such, this research provides implications for developing strategy-based rater training programs, which might help to increase consistency across raters of different experience levels.

This record has no associated files available for download.