Interpretable machine learning insights into inequalities in access to online learning
Interpretable machine learning insights into inequalities in access to online learning
Access to education is the first step to benefiting from it. Although cumulative online learning experience is linked academic learning gains (McIntyre, submitted), between-country inequalities mean that large populations are prevented from accumulating such experience. Low-and-middle-income status countries (LMICs) are affected by disadvantages in infrastructure such as internet access and uncontextualised learning content, and parents who are less available and less well-resourced than in high-income countries. COVID-19 has exacerbated the global inequalities, with girls affected more than boys in these regions.
Therefore, the present research mined online learning data to identify features that are important for access to online learning. Data mining of 54,842,787 initial data points from one online learning platform was conducted by partnering theory with data in model development. Following examination of a theory-led machine learning model, a data-led approach was taken to reach a final model. The linear regression model was regularised with the Lasso penalty to enable data-driven feature selection. Twenty-five features were selected to form an extreme gradient boosting model that underwent hyper-parameter tuning. All cross-validation adopted the grid search approach. The final model was used to derive Shapley values for feature importance.
As expected, country differences, gender, and COVID-19 were important features in access to online learning. The data-led model development resulted in additional insights not examined in the initial, theory-led model: namely, the importance of math ability, year of birth, session difficulty level, month of birth, and time taken to complete a session.
McIntyre, Nora
c9a9ecfb-10a7-4f59-b1f5-652f9db2f28f
1 September 2022
McIntyre, Nora
c9a9ecfb-10a7-4f59-b1f5-652f9db2f28f
McIntyre, Nora
(2022)
Interpretable machine learning insights into inequalities in access to online learning.
McIntyre, Nora
(ed.)
In Machines with meaning: The potential of machine learning in educational research: A symposium convened at the Biennial Meeting of EARLI SIG27.
Record type:
Conference or Workshop Item
(Paper)
Abstract
Access to education is the first step to benefiting from it. Although cumulative online learning experience is linked academic learning gains (McIntyre, submitted), between-country inequalities mean that large populations are prevented from accumulating such experience. Low-and-middle-income status countries (LMICs) are affected by disadvantages in infrastructure such as internet access and uncontextualised learning content, and parents who are less available and less well-resourced than in high-income countries. COVID-19 has exacerbated the global inequalities, with girls affected more than boys in these regions.
Therefore, the present research mined online learning data to identify features that are important for access to online learning. Data mining of 54,842,787 initial data points from one online learning platform was conducted by partnering theory with data in model development. Following examination of a theory-led machine learning model, a data-led approach was taken to reach a final model. The linear regression model was regularised with the Lasso penalty to enable data-driven feature selection. Twenty-five features were selected to form an extreme gradient boosting model that underwent hyper-parameter tuning. All cross-validation adopted the grid search approach. The final model was used to derive Shapley values for feature importance.
As expected, country differences, gender, and COVID-19 were important features in access to online learning. The data-led model development resulted in additional insights not examined in the initial, theory-led model: namely, the importance of math ability, year of birth, session difficulty level, month of birth, and time taken to complete a session.
This record has no associated files available for download.
More information
Published date: 1 September 2022
Venue - Dates:
EARLI SIG27 Conference, University of Southampton, Southampton, United Kingdom, 2022-08-30 - 2022-09-01
Identifiers
Local EPrints ID: 471079
URI: http://eprints.soton.ac.uk/id/eprint/471079
PURE UUID: 71727d1a-23e6-4593-b313-bf000edc6b7e
Catalogue record
Date deposited: 25 Oct 2022 16:42
Last modified: 26 Oct 2022 02:01
Export record
Contributors
Editor:
Nora McIntyre
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics