Scoring games fairly: Biases and interference in games based assessment
Scoring games fairly: Biases and interference in games based assessment
Gaming is an interactive medium that has much in common with education. Both games and good classroom practice are learning environments, with overall objectives, scaffolded progression, checks along the way, and regular, purposeful feedback. Games also provide a space to practice complex skills such as collaboration, or managing a system. These skills are rarely directly assessed in compulsory education because they are difficult to evidence efficiently. Games are fun learning environments for many children, and they could provide a means to resolve this problem. However, the structure of gaming data is not aligned to many assessment analysis methods. Gaming data is conditionally dependent, there are continuous variables as well as categorical and dichotomous responses, there is often more than one possible proxy for ability, and there are very large amounts of data missing. Aspects that assessors traditionally force to be constants, such as the number of attempts or the response time, become variables in games, and it is important to know their limitations and worth as variables. This interdisciplinary study looks at these problems in scoring performance in games. It uses a quantitative methodology, with a case study secondary data set from MangaHigh. MangaHigh is a website with a range of dynamic maths games for primary and secondary aged learners, and over a million children were using the site at the time of data extraction. Using a sample data set, chosen by criterion sampling, the impact of missing data, response times and additional attempts was explored through insights and methods from Item Response Theory (IRT) and other quantitative analysis techniques. Demographic data also helped to contextualize the findings and inform decision-making. In the analysis, choice of game mechanics were found to have an impact on the extent and nature of missing data, which was found to have a complex relationship with the target variable, ability. The choice of measure, such as mean, recency-weighted mean, high score or most recent score was found to be central to determining the grade. Several issues when the child competed against a human or bot competitor or collaborator were identified. Response time functioned as a context variable to define valid attempts, helping to identify non-targeted behaviours such as browsing, conceding or wandering off. As gamers have suggested, response time appeared to also function as a proxy for ability, but there does not seem to be a linear relationship between ability and time. Instead, ‘speed’ seems to be the proxy, and this was found to be a function of the response time, the child, the game and also the band score and game mechanics. Outside of an optimal range, short response times could act as a confounding variable. There was evidence that some stability of performance may also act a proxy of ability. Finally, adding a familiarity weighting when a child comes back for a second attempt proved problematic, but a novelty weighting for early attempts can work. Having said that, although games became easier with each subsequent attempt, evidence from the first attempt playing appears unreliable, and the data has features that are characteristic of guessing behaviour. Although a large number of problems were identified, this analysis also found some clear ways forward to adjust the assessment and games design, and the collection of data to make scores from games more meaningful and reduce bias in the scoring process. On the basis of this study, there are many design choices that could improve or deteriorate the quality of data gathered in gaming environments.
University of Southampton
Walsh, Clare Elizabeth
3972b47c-5ce7-45fc-b843-7dcbde9504de
January 2020
Walsh, Clare Elizabeth
3972b47c-5ce7-45fc-b843-7dcbde9504de
Walsh, Clare Elizabeth
(2020)
Scoring games fairly: Biases and interference in games based assessment.
Doctoral Thesis, 308pp.
Record type:
Thesis
(Doctoral)
Abstract
Gaming is an interactive medium that has much in common with education. Both games and good classroom practice are learning environments, with overall objectives, scaffolded progression, checks along the way, and regular, purposeful feedback. Games also provide a space to practice complex skills such as collaboration, or managing a system. These skills are rarely directly assessed in compulsory education because they are difficult to evidence efficiently. Games are fun learning environments for many children, and they could provide a means to resolve this problem. However, the structure of gaming data is not aligned to many assessment analysis methods. Gaming data is conditionally dependent, there are continuous variables as well as categorical and dichotomous responses, there is often more than one possible proxy for ability, and there are very large amounts of data missing. Aspects that assessors traditionally force to be constants, such as the number of attempts or the response time, become variables in games, and it is important to know their limitations and worth as variables. This interdisciplinary study looks at these problems in scoring performance in games. It uses a quantitative methodology, with a case study secondary data set from MangaHigh. MangaHigh is a website with a range of dynamic maths games for primary and secondary aged learners, and over a million children were using the site at the time of data extraction. Using a sample data set, chosen by criterion sampling, the impact of missing data, response times and additional attempts was explored through insights and methods from Item Response Theory (IRT) and other quantitative analysis techniques. Demographic data also helped to contextualize the findings and inform decision-making. In the analysis, choice of game mechanics were found to have an impact on the extent and nature of missing data, which was found to have a complex relationship with the target variable, ability. The choice of measure, such as mean, recency-weighted mean, high score or most recent score was found to be central to determining the grade. Several issues when the child competed against a human or bot competitor or collaborator were identified. Response time functioned as a context variable to define valid attempts, helping to identify non-targeted behaviours such as browsing, conceding or wandering off. As gamers have suggested, response time appeared to also function as a proxy for ability, but there does not seem to be a linear relationship between ability and time. Instead, ‘speed’ seems to be the proxy, and this was found to be a function of the response time, the child, the game and also the band score and game mechanics. Outside of an optimal range, short response times could act as a confounding variable. There was evidence that some stability of performance may also act a proxy of ability. Finally, adding a familiarity weighting when a child comes back for a second attempt proved problematic, but a novelty weighting for early attempts can work. Having said that, although games became easier with each subsequent attempt, evidence from the first attempt playing appears unreliable, and the data has features that are characteristic of guessing behaviour. Although a large number of problems were identified, this analysis also found some clear ways forward to adjust the assessment and games design, and the collection of data to make scores from games more meaningful and reduce bias in the scoring process. On the basis of this study, there are many design choices that could improve or deteriorate the quality of data gathered in gaming environments.
Text
Thesis Clare Walsh for depositing
Text
cew2g15 Permission to deposit_Rw
Restricted to Repository staff only
More information
Published date: January 2020
Identifiers
Local EPrints ID: 448273
URI: http://eprints.soton.ac.uk/id/eprint/448273
PURE UUID: 35278fc9-06b8-4c31-b715-f9b29d230b21
Catalogue record
Date deposited: 19 Apr 2021 16:30
Last modified: 16 Mar 2024 09:44
Export record
Contributors
Author:
Clare Elizabeth Walsh
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics