
READ ME File For the Three Studies detailed in the Thesis

Dataset DOI: 10.5258/SOTON/PG/D257

ReadMe Author: George-Catalin Muresan, University of Southampton ORCID 0009-0004-8157-3616

This dataset supports the thesis entitled "Bridging Contexts in Video-Based Virtual Dyadic Communication: From Asynchronous Non-Familiar Contexts Using Pre-Recorded Introductions to Synchronous Familiar Contexts Operationalising Engagement Metrics"
AWARDED BY: University of Southampton
DATE OF AWARD: 2026


Date of data collection: September 2020 - November 2024

Information about geographic location of data collection: The dataset includes participants recruited internationally across Europe, North America, and Australia. Data were collected remotely through online experimental sessions. All subsequent data processing and analysis were carried out in the United Kingdom.

Licence:
CC-BY

Related projects:
https://doi.org/10.1016/j.ijhcs.2024.103279 : research published that is part of this thesis work;
https://doi.org/10.3389/fcomp.2021.661890 : research published that directly motivate the thesis work.



--------------------
DATA & FILE OVERVIEW
--------------------

All data have been pseudonymised prior to deposition. Any identifying information has been removed or generalised to ensure participant confidentiality.

This dataset archive contains the following structure:

/README.txt

/Asynchronous Contexts/
    /Information sheet/
        - 3 PNG files (participant information sheet for asynchronous study)

    /Output data/
        - 6 PNG files (processed statistical output tables used for analysis and findings)

    /Quants and quals/
        - 4 Excel files (raw and processed quantitative and qualitative data)

    /Processing data scripts/
        - filter.py
        - meq.py

        /group_satisfaction/
            - __init__.py
            - group_satisfaction.py
            - setup.py

        /task_performance/
            - performance_analysis.py
            - performance_parser.py

/Synchronous Contexts/
    /Information sheet/
        - 3 PNG files (participant information sheet used across studies)

    /Quants/
        - 1 Excel file (quantitative variables used in study design and analysis context)

    /Study 1/
        /Output data/
            - 4 PNG files (processed statistical output tables used for analysis and findings)

        /Quants and quals/
            - 5 Excel files + 1 PDF (raw and processed quantitative and qualitative data)

    /Study_2/
        /Output data/
            - 7 PNG files (processed statistical output tables used for analysis and findings)

        /Quants and quals/
            - 4 CSV files + 1 Excel file (raw and processed quantitative and qualitative data)


This dataset contains raw, processed, and derived data supporting two contexts: an asynchronous dyadic communication study and a synchronous dyadic communication study.

The dataset is organised into two main directories: Asynchronous Context and Synchronous Context.

Each directory contains:
- Participant information sheets (PNG format);
- Raw quantitative and qualitative survey data (Excel / CSV / PDF formats);
- Processed statistical outputs used in analysis (PNG tables);
- Python scripts used for preprocessing and metric computation.

Processing scripts were used to transform raw survey and interaction data into aggregated metrics used in statistical analysis and figures. Additional statistical analysis was performed using SPSS.


Within each study folder, raw data files (Quants and quals) were processed using Python scripts located in processing_data_scripts or SPSS to generate derived datasets and computed metrics. These processed outputs were then used to produce statistical tables stored in Output data. In the Synchronous Context dataset, Study 1 and Study 2 follow the same processing pipeline but contain separate participant groups and experimental conditions.

Additional related data collected that was not included in the current data package: Some additional experimental artefacts (i.e. intermediate working files, temporary analysis outputs, and non-final visualisations) were excluded as they are not required for replication of results. Furthermore, no identifiable video or audio recordings are included due to participant privacy considerations.

No external datasets were used. All data were collected as part of the studies described in the associated PhD thesis.

This is the final version of the dataset corresponding to the post-viva corrected submission of the PhD thesis. No previous public versions exist.



--------------------------
METHODOLOGICAL INFORMATION
--------------------------

All studies were conducted in accordance with approved ethical procedures. Participants provided informed consent via a sign-up questionnaire that collected demographic information. By completing the questionnaire, participants confirmed that they had read the Participant Information Sheet and agreed to participate in the study.

Description of methods used for collection/generation of data: Data were collected through three controlled online studies involving dyadic video-based communication tasks.

Participants completed structured tasks and post-interaction surveys measuring engagement, satisfaction, task performance, and communication dynamics.

Study 1 (asynchronous contexts) involved pre-recorded introductions, while Study 2 and 3 (synchronous contexts) involved real-time interaction with feedback-based experimental conditions.

In direct association with this thesis, there is one published article: https://doi.org/10.1016/j.ijhcs.2024.103279


Methods for processing the data: Raw survey responses and interaction logs were processed using Python scripts developed for this research and SPSS.

In Study 1, scripts computed aggregated metrics for engagement, task performance, and group satisfaction.

For the remainder of Study 1, and Studies 2+3, final statistical analyses were conducted using SPSS, including descriptive statistics and inferential testing (i.e. ANOVA where applicable).

Processed outputs were exported as tables and figures for inclusion in the thesis.


Software- or Instrument-specific information needed to interpret the data: Python was used for data preprocessing and metric computation (custom scripts included in dataset). Statistical analysis was performed using IBM SPSS Statistics. Survey data were collected using online experimental platforms (web-based forms i.e. Google Survey).

Standards and calibration information, if appropriate: No physical measurement instruments or calibration procedures were used. All data were collected via online experimental systems.

Environmental/experimental conditions: All studies were conducted remotely in online video-based experimental conditions with participants located internationally. Experiments were performed in controlled task-based communication sessions under structured experimental protocols.

Describe any quality-assurance procedures performed on the data: Data cleaning procedures included translation into English where applicable, removal of incomplete responses and verification of consistency across survey and task performance metrics. 

People involved with sample collection, processing, analysis and/or submission: Data collection, processing, and analysis were conducted by George-Catalin Muresan. Sebastian Mititelu contributed to the development and refinement of Python processing scripts used for data transformation and metric computation, as detailed in the published work. Supervision and research oversight were provided by the thesis supervisory team as listed in the thesis acknowledgements.


--------------------------
DATA-SPECIFIC INFORMATION
--------------------------

This dataset contains multiple data types across two studies (asynchronous and synchronous contexts). Data files are grouped into raw survey datasets (Excel/CSV), processed statistical outputs (PNG tables), and analysis scripts (Python).

1. Survey datasets (Excel, CSV, PDF)
These files contain raw and partially processed survey responses collected from participants during experimental sessions.

Number of variables:
Varies by study and questionnaire (typically includes 5-point Likert-scale items, the Office for National Statistics (ONS) Harmonised Demographic Questions, and task performance measures - speed, accuracy, counts).

Number of cases/rows:
Varies by study and dataset version (each row corresponds to one participant or one dyadic interaction, depending on study design).

Variables include:
- Participant identifiers (anonymised, i.e P1, P2 etc.);
- Demographic fields (i.e. age group, region);
- 5-point Likert-scale survey responses (i.e. engagement, satisfaction, trust);
- Task performance metrics (i.e. completion scores and times, interaction measures);
- Communication behaviour indicators (e.g. speaking time, balance metrics).

Missing responses are either left blank or encoded as NA depending on export format.

Specialised formats or abbreviations:
- Likert scales typically range from 1–5;
- Dyadic identifiers are anonymised IDs: P1, P2 etc.

Date created: 2020–2025 (data collected and processed across study periods)


2. Processed statistical outputs (PNG tables)

These files contain aggregated statistical results generated from the raw datasets using SPSS analysis.

Number of variables: Not applicable (derived summary outputs)

Number of cases/rows: Not applicable (aggregated statistical summaries)

Variable list: Not applicable (visualised statistical outputs)

Missing data codes: Not applicable

Specialised formats: Tables represent aggregated statistical outputs such as means, standard deviations, and inferential test results.

Date created:
2020–2025 (post-processing and analysis phase)
