The University of Southampton
University of Southampton Institutional Repository

Interpretable machine learning insights into inequalities in access to online learning

Interpretable machine learning insights into inequalities in access to online learning
Interpretable machine learning insights into inequalities in access to online learning
Access to education is the first step to benefiting from it. Although cumulative online learning experience is linked academic learning gains (McIntyre, submitted), between-country inequalities mean that large populations are prevented from accumulating such experience. Low-and-middle-income status countries (LMICs) are affected by disadvantages in infrastructure such as internet access and uncontextualised learning content, and parents who are less available and less well-resourced than in high-income countries. COVID-19 has exacerbated the global inequalities, with girls affected more than boys in these regions.
Therefore, the present research mined online learning data to identify features that are important for access to online learning. Data mining of 54,842,787 initial data points from one online learning platform was conducted by partnering theory with data in model development. Following examination of a theory-led machine learning model, a data-led approach was taken to reach a final model. The linear regression model was regularised with the Lasso penalty to enable data-driven feature selection. Twenty-five features were selected to form an extreme gradient boosting model that underwent hyper-parameter tuning. All cross-validation adopted the grid search approach. The final model was used to derive Shapley values for feature importance.
As expected, country differences, gender, and COVID-19 were important features in access to online learning. The data-led model development resulted in additional insights not examined in the initial, theory-led model: namely, the importance of math ability, year of birth, session difficulty level, month of birth, and time taken to complete a session.
McIntyre, Nora
c9a9ecfb-10a7-4f59-b1f5-652f9db2f28f
McIntyre, Nora
McIntyre, Nora
c9a9ecfb-10a7-4f59-b1f5-652f9db2f28f
McIntyre, Nora

McIntyre, Nora (2022) Interpretable machine learning insights into inequalities in access to online learning. McIntyre, Nora (ed.) In Machines with meaning: The potential of machine learning in educational research: A symposium convened at the Biennial Meeting of EARLI SIG27.

Record type: Conference or Workshop Item (Paper)

Abstract

Access to education is the first step to benefiting from it. Although cumulative online learning experience is linked academic learning gains (McIntyre, submitted), between-country inequalities mean that large populations are prevented from accumulating such experience. Low-and-middle-income status countries (LMICs) are affected by disadvantages in infrastructure such as internet access and uncontextualised learning content, and parents who are less available and less well-resourced than in high-income countries. COVID-19 has exacerbated the global inequalities, with girls affected more than boys in these regions.
Therefore, the present research mined online learning data to identify features that are important for access to online learning. Data mining of 54,842,787 initial data points from one online learning platform was conducted by partnering theory with data in model development. Following examination of a theory-led machine learning model, a data-led approach was taken to reach a final model. The linear regression model was regularised with the Lasso penalty to enable data-driven feature selection. Twenty-five features were selected to form an extreme gradient boosting model that underwent hyper-parameter tuning. All cross-validation adopted the grid search approach. The final model was used to derive Shapley values for feature importance.
As expected, country differences, gender, and COVID-19 were important features in access to online learning. The data-led model development resulted in additional insights not examined in the initial, theory-led model: namely, the importance of math ability, year of birth, session difficulty level, month of birth, and time taken to complete a session.

This record has no associated files available for download.

More information

Published date: 1 September 2022
Venue - Dates: EARLI SIG27 Conference, University of Southampton, Southampton, United Kingdom, 2022-08-30 - 2022-09-01

Identifiers

Local EPrints ID: 471079
URI: http://eprints.soton.ac.uk/id/eprint/471079
PURE UUID: 71727d1a-23e6-4593-b313-bf000edc6b7e
ORCID for Nora McIntyre: ORCID iD orcid.org/0000-0003-4626-3298

Catalogue record

Date deposited: 25 Oct 2022 16:42
Last modified: 26 Oct 2022 02:01

Export record

Contributors

Author: Nora McIntyre ORCID iD
Editor: Nora McIntyre

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×