The University of Southampton
University of Southampton Institutional Repository

Feature engineering for clustering analysis of large and heterogeneous educational datasets

Feature engineering for clustering analysis of large and heterogeneous educational datasets
Feature engineering for clustering analysis of large and heterogeneous educational datasets
Data was collected from a total of 271,867 learners from nineteen courses from the University of Southampton between 2014-2019 on topics on HCI, archaeology, and pedagogy of language teaching. Seventeen of these courses were MOOCs, and two were in a face-to-face setting that included activities within a peer-supported digital environment. The MOOC data were from two distinct courses offered eleven and six times respectively on FutureLearn. The University of Southampton is one of the founding partners and has 20 other MOOCs on offer in the platform.

The samples included timestamped digital traces of activity and comments generated by learners in these courses. Achievement data for a learner consisted of what percentage of the course's steps were completed by the learner in the case of MOOCs (with 50% being the minimum required for certification eligibility) whereas for the face-to-face learner it is the actual marks awarded in their assessment in the module.

Feature engineering was performed, obtaining ARFF files including over 60 features per learner, for a ten-fold cross-validation of clustering algorithms contrasting Expectation Maximization (EM), Simple K-Means and X-Means with k varying from 4 to 7.

This poster offers details on the feature engineering process and preliminary findings.
clustering, feature engineering, educational data
Wilde, Adriana Gabriela
37ee0dec-a07f-4177-b291-96037fe48e14
Wilde, Adriana Gabriela
37ee0dec-a07f-4177-b291-96037fe48e14

Wilde, Adriana Gabriela (2021) Feature engineering for clustering analysis of large and heterogeneous educational datasets. Women in Data Science Cambridge conference, Online, Standford, United States. 11 Mar 2021. (In Press)

Record type: Conference or Workshop Item (Poster)

Abstract

Data was collected from a total of 271,867 learners from nineteen courses from the University of Southampton between 2014-2019 on topics on HCI, archaeology, and pedagogy of language teaching. Seventeen of these courses were MOOCs, and two were in a face-to-face setting that included activities within a peer-supported digital environment. The MOOC data were from two distinct courses offered eleven and six times respectively on FutureLearn. The University of Southampton is one of the founding partners and has 20 other MOOCs on offer in the platform.

The samples included timestamped digital traces of activity and comments generated by learners in these courses. Achievement data for a learner consisted of what percentage of the course's steps were completed by the learner in the case of MOOCs (with 50% being the minimum required for certification eligibility) whereas for the face-to-face learner it is the actual marks awarded in their assessment in the module.

Feature engineering was performed, obtaining ARFF files including over 60 features per learner, for a ten-fold cross-validation of clustering algorithms contrasting Expectation Maximization (EM), Simple K-Means and X-Means with k varying from 4 to 7.

This poster offers details on the feature engineering process and preliminary findings.

Text
WiDS2021-agw106-clustering - Accepted Manuscript
Restricted to Repository staff only
Available under License Creative Commons Attribution.
Request a copy

More information

Accepted/In Press date: 16 February 2021
Venue - Dates: Women in Data Science Cambridge conference, Online, Standford, United States, 2021-03-11 - 2021-03-11
Keywords: clustering, feature engineering, educational data

Identifiers

Local EPrints ID: 446677
URI: http://eprints.soton.ac.uk/id/eprint/446677
PURE UUID: 96fe8471-6b48-4d92-a39a-0be0ce6c7a43
ORCID for Adriana Gabriela Wilde: ORCID iD orcid.org/0000-0002-1684-1539

Catalogue record

Date deposited: 17 Feb 2021 17:34
Last modified: 18 Feb 2021 17:26

Export record

Contributors

Author: Adriana Gabriela Wilde ORCID iD

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×