Feature engineering for clustering analysis of large and heterogeneous educational datasets

Wilde, Adriana (2021) Feature engineering for clustering analysis of large and heterogeneous educational datasets. Women in Data Science Cambridge conference, Online, Standford, United States. 11 Mar 2021. (In Press)

Record type: Conference or Workshop Item (Poster)

Abstract

Data was collected from a total of 271,867 learners from nineteen courses from the University of Southampton between 2014-2019 on topics on HCI, archaeology, and pedagogy of language teaching. Seventeen of these courses were MOOCs, and two were in a face-to-face setting that included activities within a peer-supported digital environment. The MOOC data were from two distinct courses offered eleven and six times respectively on FutureLearn. The University of Southampton is one of the founding partners and has 20 other MOOCs on offer in the platform.

The samples included timestamped digital traces of activity and comments generated by learners in these courses. Achievement data for a learner consisted of what percentage of the course's steps were completed by the learner in the case of MOOCs (with 50% being the minimum required for certification eligibility) whereas for the face-to-face learner it is the actual marks awarded in their assessment in the module.

Feature engineering was performed, obtaining ARFF files including over 60 features per learner, for a ten-fold cross-validation of clustering algorithms contrasting Expectation Maximization (EM), Simple K-Means and X-Means with k varying from 4 to 7.

This poster offers details on the feature engineering process and preliminary findings.

Text

WiDS2021-agw106-clustering - Accepted Manuscript

Restricted to Repository staff only

Available under License Creative Commons Attribution.

Request a copy