Feature engineering for clustering analysis of large and heterogeneous educational datasets
Feature engineering for clustering analysis of large and heterogeneous educational datasets
Data was collected from a total of 271,867 learners from nineteen courses from the University of Southampton between 2014-2019 on topics on HCI, archaeology, and pedagogy of language teaching. Seventeen of these courses were MOOCs, and two were in a face-to-face setting that included activities within a peer-supported digital environment. The MOOC data were from two distinct courses offered eleven and six times respectively on FutureLearn. The University of Southampton is one of the founding partners and has 20 other MOOCs on offer in the platform.
The samples included timestamped digital traces of activity and comments generated by learners in these courses. Achievement data for a learner consisted of what percentage of the course's steps were completed by the learner in the case of MOOCs (with 50% being the minimum required for certification eligibility) whereas for the face-to-face learner it is the actual marks awarded in their assessment in the module.
Feature engineering was performed, obtaining ARFF files including over 60 features per learner, for a ten-fold cross-validation of clustering algorithms contrasting Expectation Maximization (EM), Simple K-Means and X-Means with k varying from 4 to 7.
This poster offers details on the feature engineering process and preliminary findings.
clustering, feature engineering, educational data
Wilde, Adriana
4f9174fe-482a-4114-8e81-79b835946224
Wilde, Adriana
4f9174fe-482a-4114-8e81-79b835946224
Wilde, Adriana
(2021)
Feature engineering for clustering analysis of large and heterogeneous educational datasets.
Women in Data Science Cambridge conference, Online, Standford, United States.
11 Mar 2021.
(In Press)
Record type:
Conference or Workshop Item
(Poster)
Abstract
Data was collected from a total of 271,867 learners from nineteen courses from the University of Southampton between 2014-2019 on topics on HCI, archaeology, and pedagogy of language teaching. Seventeen of these courses were MOOCs, and two were in a face-to-face setting that included activities within a peer-supported digital environment. The MOOC data were from two distinct courses offered eleven and six times respectively on FutureLearn. The University of Southampton is one of the founding partners and has 20 other MOOCs on offer in the platform.
The samples included timestamped digital traces of activity and comments generated by learners in these courses. Achievement data for a learner consisted of what percentage of the course's steps were completed by the learner in the case of MOOCs (with 50% being the minimum required for certification eligibility) whereas for the face-to-face learner it is the actual marks awarded in their assessment in the module.
Feature engineering was performed, obtaining ARFF files including over 60 features per learner, for a ten-fold cross-validation of clustering algorithms contrasting Expectation Maximization (EM), Simple K-Means and X-Means with k varying from 4 to 7.
This poster offers details on the feature engineering process and preliminary findings.
Text
WiDS2021-agw106-clustering
- Accepted Manuscript
Restricted to Repository staff only
Request a copy
More information
Accepted/In Press date: 16 February 2021
Venue - Dates:
Women in Data Science Cambridge conference, Online, Standford, United States, 2021-03-11 - 2021-03-11
Keywords:
clustering, feature engineering, educational data
Identifiers
Local EPrints ID: 446677
URI: http://eprints.soton.ac.uk/id/eprint/446677
PURE UUID: 96fe8471-6b48-4d92-a39a-0be0ce6c7a43
Catalogue record
Date deposited: 17 Feb 2021 17:34
Last modified: 17 Mar 2024 03:23
Export record
Contributors
Author:
Adriana Wilde
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics