READ ME File For 'Pedicting behavioural speech reception threshold using cortical responses to continous sound' dataset

Dataset DOI:  https://doi.org/10.5258/SOTON/D2578

README Author: Suwijak Deoisres, University of Southampton (sd1n17@soton.ac.uk)

This dataset supports the PhD thesis:
"Auditory cortical responses as an objective predictor of speech-in-noise performance" [specifically chapter 6]

---------------------
Contents
---------------------
- Overview for EEG data
- SIN_EEG
- SIN_EEG_Da
- SIN_EEG_python_script
- SIN_SRT
- SIN_Stimuli
- SIN_Stimuli_Order

---------------------
Overview for EEG data
---------------------
*All EEG were recorded and saved as .bdf using Biosemi ActiveTwo 32-channel system.
*Tigger channel is channel number 48 in the EEG .bdf file.
*EEG data were collected from 22 subjects (S01 to S22).
*A Python script name "Load_bdf.py" (in SIN_EEG_python_script) provides an example on how the raw data can be extracted (additional Python package maybe needed).

---------------------
SIN_EEG (divided into 4 .rar files)
---------------------
- Raw (unprocessed and unreferenced) recodings of cortical responses to natural speech and modulated noise.
- Files containing '_speech' and '_noise' are EEG reponses to natural speech and modulated noise without background noise, respectively. Theses are approximately 15 minutes of recorded EEG used to train the temporal response function (TRF).
- There are triggers to indicate segments in each recording, which corresponds to the triggers in the 'odin_04_speech_full_65dB_trig.wav' and 'odin_04_mod_amp_full_65dB_trig.wav' audio files. 

- Files containing '_MixedSpeechNoise' are EEG responses to natural speech and modulated noise at 5 level of background noise, no noise, 0, -5, -10, and -15 dB SNR.
- The EEG triggers in these files indicate 2-minute segments of responses to speech or modulated noise at certain SNR.
- The '_MixedSpeechNoise' files should be view together with the "Subject_SNRpresent_order_speech_and_noise" (in SIN_Stimuli_Order) spreadsheet to indentify which SNR and stimulus condition do the segments in the EEG correspond to.
- These EEG were used as the testing data for the TRF.

- Specifically for subject 2 and 3 (S03 and S04), EEG data in no background noise condition are 30 minutes (2 times longer than other subjects). 
- These two subject also have no EEG response to repeating /da/ stimuli.

---------------------
SIN_EEG_Da
---------------------
- Raw (unprocessed and unreferenced) recordings of cortical auditory evoked potentials (CAEP) in response to 1000 /da/ stimuli (alternating polarity) at 5 SNR level, no noise, 0, -5, -10, and -15 dB (200 epochs for each SNR condition).
- Triggers in EEG responses to /da/ needs to be view together with 'Subject_SNRpresent_order_da' spreadsheet to indentify the order of SNR conditions presented. 
- This folder also contains a Python script to help with segmenting EEG response to /da/ at each SNR level, and obtaining the CAEP through coherent averaging.

---------------------
SIN_EEG_python_script
---------------------
- Some python scripts are provided to mainly help with the data segmentation (order of randomised SNR conditions in '_MixedSpeechNoise' files) after EEG pre-processing.
- The scripts also perform the EEG speech envelope decoding, and bootstrapping to test the significance of speech envelope reconstruction correlation coefficient.

---------------------
SIN_SRT
---------------------
- The spreadsheets contains each participant's Matrix sentecnces test score (in percentage of words correct) at each SNR level (no noise, 0, -5, -10, and -15 dB)
- The first row indicates the testing SNR level in dB (20 dB for no noise condition).
- Down each column, the number in the cells from the 2nd to 21st row are the percentage of words correct from each testing sentences (total of 20 sentecnes, each contains 5 words).

---------------------
SIN_Stimuli
---------------------
- All audio files are in .wav format, 44.1kHz sampling frequency.
- Sound for stimulation is in the right channel. Audio triggers to align with EEG triggers are in the left channel.
- Files 'odin_04_speech_full_65dB_trig.wav' and 'odin_04_mod_amp_full_65dB_trig.wav' are used for stimulating EEG responses to speech and modulated noise in no background noise condition (15 minutes).
- Specifically for subject 2 and 3 (S03 and S04), files 'odin_04_and_05_speech_full_65dB_trig.wav' and 'odin_04_and_05_mod_amp_full_65dB_trig.wav' were used for the same purpose as above, the only difference is the duration (30 minutes).

- Files 'odin_01_speech_full_65dB_2minsx4_trig.wav' and 'odin_01_mod_amp_full_65dB_2minsx4_trig.wav' are audio with 4 of 2-minutes segments used to generate EEG responses with '_MixedSpeechNoise' in the file name.
- These 4 segments were played in 5 SNR condition condition (8 minutes of EEG responses at each SNR level).
- The two files need to be viewed together with the 'Subject_SNRpresent_order_speech_and_noise' spreadsheet to identify the order of SNR level and sound condition presented.

- File 'Da_1.11sISI_200_reps_65dB_trig.wav' was used to generate EEG responses to /da/.

---------------------
SIN_Stimuli_Order
---------------------
- The 'Subject_SNRpresent_order_speech_and_noise' and 'Subject_SNRpresent_order_da' contains information about the order of stimuli segment and sound condtion presented to each participant.

- In the 'Subject_SNRpresent_order_speech_and_noise' file, each column shows the order of stimuli segment and sound condtion presented to each participant (e.g., column A, B, C for subject 1, 2, 3, and etc.).
- The number in cells (from 1 to 40) down each column correspond to the SNR and sound condition as shown in the tables below.

*** Natural speech condition ***
   SNR	   |	 number in cell (the sequence in each row down this column correspond to the order of segments in 'odin_01_speech_full_65dB_2minsx4_trig.wav',
					e.g., 25, 26, 27, and 28 correspond to the 1st, 2nd, 3rd, and 4th segment in the audio)
--------------------------------
 no noise  |	1, 2, 3, 4
  -5 dB    |	9, 10, 11, 12
 -15 dB    |	17, 18, 19, 20
   0 dB    | 	25, 26, 27, 28
 -10 dB    |	33, 34, 35, 36

*** Modulated noise condition ***
   SNR	   |	 number in cell (the sequence in each row down this column correspond to the order of segments in 'odin_01_mod_amp_full_65dB_2minsx4_trig.wav', 
					e.g., 13, 14, 15, and 16 correspond to the 1st, 2nd, 3rd, and 4th segment in the audio)
--------------------------------
   0 dB    |	5, 6, 7, 8
 -10 dB    |	13, 14, 15, 16
 no noise  |	21, 22, 23, 24
  -5 dB    | 	29, 30, 31, 32
 -15 dB    |	37, 38, 39, 40


- In the 'Subject_SNRpresent_order_da' file, each column shows the order of SNR condition which /da/ stimuli are presented to each participant (e.g., column A, B, C for subject 1, 2, 3, and etc.).
- The number 1, 2, 3, 4, and 5 in the cells down the column correspond to no noise, 0, -5, -10, and -15 dB SNR condition, respectively.
- At each SNR level, 200 /da/ stimuli were presented before progressing to the next SNR level.
- Use the first 200 epochs indicated by the EEG triggers to extract CAEP to /da/ in the first SNR condition.
- The next 200 epochs after that (epoch 201 to 400) will be the CAEP to /da/ in the second SNR condition, and so on...

*************************************************
Geographic location of data collection: University of Southampton, U.K.

Dataset available under a CC BY 4.0 licence

Publisher: University of Southampton, U.K.

Date: March 2023