The University of Southampton
University of Southampton Institutional Repository

AI and human scoring for postgraduate writing: evaluating score reliability, variability, and rater behaviours

AI and human scoring for postgraduate writing: evaluating score reliability, variability, and rater behaviours
AI and human scoring for postgraduate writing: evaluating score reliability, variability, and rater behaviours

This study examines the reliability and consistency of AutoMarkGPT, a customized version of ChatGPT-4.0, in scoring postgraduate writing assignments across multiple time intervals. While Automated Writing Evaluation (AWE) tools and AI models are increasingly utilized in educational contexts, prior research has largely relied on standard models and one-time scoring sessions. Addressing this gap, the study compares AutoMarkGPT’s performance with that of four human raters who assessed the same 97 assignments. Employing a convergent parallel mixed-methods design, the research integrates quantitative analysis, including t-tests, correlations, and Many-Facet Rasch Measurement via Facets, with qualitative data from post-rating interviews. Results reveal that AutoMarkGPT provided more consistent and generally higher scores than human raters, who demonstrated stricter grading and greater variability due to subjective factors, such as rubric interpretation and professional background. However, AI showed mild fluctuations in scores over time. Findings suggest that blending AI and human input could enhance assessment reliability, provided continuous rater training is ensured.

AI in scoring, Automated Writing Evaluation (AWE), Rater behaviour, Score reliability, Score variability
1879-2529
Han, Turgay
e7fe202c-a0dd-426b-81e9-5beaf4458d7a
Zheng, Ying
abc38a5e-a4ba-460e-92e2-b766d11d2b29
Han, Turgay
e7fe202c-a0dd-426b-81e9-5beaf4458d7a
Zheng, Ying
abc38a5e-a4ba-460e-92e2-b766d11d2b29

Han, Turgay and Zheng, Ying (2026) AI and human scoring for postgraduate writing: evaluating score reliability, variability, and rater behaviours. Studies in Educational Evaluation, 88, [101572]. (doi:10.1016/j.stueduc.2026.101572).

Record type: Article

Abstract

This study examines the reliability and consistency of AutoMarkGPT, a customized version of ChatGPT-4.0, in scoring postgraduate writing assignments across multiple time intervals. While Automated Writing Evaluation (AWE) tools and AI models are increasingly utilized in educational contexts, prior research has largely relied on standard models and one-time scoring sessions. Addressing this gap, the study compares AutoMarkGPT’s performance with that of four human raters who assessed the same 97 assignments. Employing a convergent parallel mixed-methods design, the research integrates quantitative analysis, including t-tests, correlations, and Many-Facet Rasch Measurement via Facets, with qualitative data from post-rating interviews. Results reveal that AutoMarkGPT provided more consistent and generally higher scores than human raters, who demonstrated stricter grading and greater variability due to subjective factors, such as rubric interpretation and professional background. However, AI showed mild fluctuations in scores over time. Findings suggest that blending AI and human input could enhance assessment reliability, provided continuous rater training is ensured.

Text
Han & Zheng 2026 (accepted version) - Accepted Manuscript
Restricted to Repository staff only until 11 August 2027.
Request a copy

More information

Accepted/In Press date: 23 January 2026
e-pub ahead of print date: 11 February 2026
Published date: 11 February 2026
Keywords: AI in scoring, Automated Writing Evaluation (AWE), Rater behaviour, Score reliability, Score variability

Identifiers

Local EPrints ID: 510517
URI: http://eprints.soton.ac.uk/id/eprint/510517
ISSN: 1879-2529
PURE UUID: 2a93941e-428a-4c9e-a0e2-b2b5418c7fee
ORCID for Ying Zheng: ORCID iD orcid.org/0000-0003-2574-0358

Catalogue record

Date deposited: 13 Apr 2026 14:38
Last modified: 14 Apr 2026 01:49

Export record

Altmetrics

Contributors

Author: Turgay Han
Author: Ying Zheng ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×