Düzenli Ifadeler ile Ingilizce Dil Gruplarinin Analiz Edilmesi

In most of widely used distance education platforms which are named as MOOC (Massive Open Online Courses) language of lectures are English, but even so, they have participants from a lot of different countries. This situation causes differences in learners usage behaviors and performances. In our previous studies we tried to divide the users into language groups according to their English language proficiency. In this study, with natural language processing techniques we aimed to improve the division of language groups of students and automatically generate datasets which belong to language groups from a distance education platform named as FutureLearn. In FutureLearn platform (like other distance education platforms), learners do not have to provide their country information while registering. Also for some of the learners, provided country information belongs to where they currently live which is different from their home country. In such situations, it is not possible to determine whether English is their first, official or secondary language. Our study focused on using regex patterns to update learners language groups' labels with aim of using them in future studies like predicting the learners' language groups. As data source the datasets of «Understanding Language: Learning and Teaching-4» course on the FutureLearn platform is used. To update the language groups with natural language processing we mostly used features like learners' comments, ids, and country information. As a result of this study, with the analysis of the comments of the users, we identified 63.06% of all commented users' language groups which consist of English as official and primary language, English is official but not primary language and English is not official language. It is observed that 78.19% of these learners belong to the same language group as their provided country information in registration progress and 21.81% of users groups' home country is different from their language group which is identified from their comments. When we just use their country information (the information provided in registration step) number of English language group identified learners were lower and identified learners' language groups could be wrong.

FutureLearn, identification of English language groups, MOOC, natural language processing, Regex

10.1109/ASYU.2018.8554018

IEEE

Duru, Ismail

deabd39c-9f2f-4d56-ac2e-bb205a89a55d

Diri, Banu

bb699481-69e7-47fd-9bef-5923de449d8d

Özçevik, M. Emir

d595ab34-7deb-4cd3-a56c-2e576ca853d2

Ataseven, Kerim

822ada7f-a8d9-4e8b-ab0d-3db97a4ea13b

Doǧan, Gülüstan

30ad4cd6-1955-4ced-882c-d76a0ec25741

White, Su

5f9a277b-df62-4079-ae97-b9c35264c146

29 November 2018