READ ME File For 'Datasets and Syntax: Pupil Competence and COVID-19 School Closures Evidence on Distance Learning and Remediation Policies from International Assessments in 30 Countries'

Dataset DOI: https://doi.org/10.5258/SOTON/D3644

ReadMe Author: YIN WANG, University of Southampton ORCID ID https://orcid.org/0009-0004-9440-2410

This dataset supports the thesis entitled: Pupil Competence and COVID-19 School Closures Evidence on Distance Learning and Remediation Policies from International Assessments in 30 Countries
AWARDED BY: Univeristy of Southampton
DATE OF AWARD: 2025

DESCRIPTION OF THE DATA

This dataset accompanies the PhD thesis Pupil Competence During the COVID-19-induced School Closures: An Analysis of the Effect of Distance Learning and Remediation Policies Using International Assessment Data in 30 Countries. The dataset compiles country-level data derived from large-scale international student assessments, specifically PISA and PIRLS, covering the period 2000每2022. It was created by harmonising publicly available microdata from the OECD (for PISA) and IEA (for PIRLS), aggregated to the national level. The data were collected and processed using StataNow 18.5. The dataset can be opened in StataNow 18.5 software. Stata .do files are also provided to allow full reproducibility of the data preparation and analysis. The dataset is specifically structured to support advanced statistical modelling, including Latent Growth Curve Modelling (LGCM), Synthetic Control (SC), and Synthetic Difference-in-Differences (SDID), to examine the effects of COVID-19 policies on pupil competence across diverse national contexts.

This dataset contains:

Dataset1: PISA Country-Level Achievement Pseudo-Panel Datasets for LGCM (PseudoPanel_SES_Trisection.dta; PseudoPanel_SES_Quantile.dta; PseudoPanel_SES_ParentEduOnly.dta)

Do File name: Latent Growth Curve Modeling (LGCM) with PISA Data.do

Dataset2: PIRLS LGCM Datasets (PseudoPanel_PIRLS_SES_Trisection.dta; PseudoPanel_PIRLS_SES_Quantile.dta; PseudoPanel_PIRLS_SES_ParentEduOnly.dta )

Do File name: Latent Growth Curve Modeling (LGCM) with PIRLS Data.do

Do File name: PIRLS Data Robust Check PIRLS Waves.do

Dataset3: PISA Synthetic Control Analysis Dataset (Synthetic Control Analysis of COVID-19 Distance Learning and Remediation Policies with PISA Mathematics Data.do; Synthetic Control Analysis of COVID-19 Distance Learning and Remediation Policies with PISA Reading Data.do; Synthetic Control Analysis of COVID-19 Distance Learning and Remediation Policies with PISA Science Data.do)

Do File Name: Synthetic Control Analysis of COVID-19 Distance Learning and Remediation Policies with PISA Mathematics / Reading/ Science Data. do

Dataset4: PIRLS Synthetic Control Analysis Dataset

Do File Name: PIRLS Reading Synthetic Control command.do

Do File Name: SDID_Analysis_PISA_Mathematics.do; SDID_Analysis_PISA_Reading.do; SDID_Analysis_PISA_Science.do

Do File Name: DID_Analysis_PIRLS_Reading.do

Do File Name: SDID Parallel Trends Robustness Check for PISA Dataset. do

Do File Name: SDID Parallel Trends Robustness Check for PIRLS Dataset.do

Date of data collection: 2022.01-2024.05

Licence:
The original data used in this dataset were obtained from the OECD PISA Database and the IEA PIRLS Database.

PISA data are provided under the OECD Terms and Conditions, available for research and educational purposes only, and must not be used for commercial purposes. Users must cite the OECD PISA Database as the data source.

PIRLS data are provided under the IEA Data Repository Terms of Use, available for research and educational purposes only, and must not be used for commercial purposes. Users must cite the IEA PIRLS Database as the data source.

This archived dataset represents derived data compiled by the author from PISA and PIRLS sources. Users must comply with the OECD and IEA data usage policies when reusing these data.

Related projects/Funders:
Self-funded PhD research

Related publication:
PhD thesis (in preparation)

Date that the file was created: 08, 2025






Datasets: PISA Country-Level Mean Achievement Pseudo-Panel Datasets for LGCM
Description
This dataset contains three versions of country-level pseudo-panel data constructed from the Programme for International Student Assessment (PISA) sweeps between 2000 and 2022. The datasets were created to enable Latent Growth Curve Modelling (LGCM) at the group level by aggregating individual-level PISA data.
The central focus is the operationalisation of Socioeconomic Status (SES), a key variable in the analysis. Since SES is not directly available in a harmonised format across cycles, we constructed three versions of SES groupings for robustness checks:
1. Trisection SES (Primary dataset)
o SES constructed using Principal Component Analysis (PCA) of parents＊ highest years of schooling and International Socio-Economic Index of Occupational Status (ISEI).
o Categorised into Low / Medium / High groups by dividing the SES range into three equal sections (33rd and 67th percentiles).
o Used as the main dataset for LGCM.
2. Quantile SES (Robustness check A)
o Same PCA-based SES variable (parental education + ISEI).
o Categorised into Low / Medium / High groups based on sample quantiles, ensuring each group contains ~one third of observations.
o Used as a robustness check against classification thresholds.
3. Parental Education Only SES (Robustness check B)
o SES is constructed solely from the average of the highest years of schooling of pupils＊ parents.
o Provides a robustness check on the role of occupational status.
Variables
* Means = Country-level mean student achievement (Math/Reading/Science)
* Time_Num = Numeric time variable for growth (e.g., 1＃6 corresponding to sweeps)
* Subject = Subject indicator (1=Math, 2=Reading, 3=Science)
* SchoolClosure = Days of school closure (constructed for 2022 cycle)
* teacher_students_ratio = Teacher-to-student ratio (country-level)
* Age_of_SchoolTracking = Country-level tracking age
* Sex = Student gender indicator (0=male, 1=female)
* IMMIG = Immigration background (0=native, 1=first gen, 2=second gen)
* SES = Socioeconomic Status (categorised differently across the three datasets)
* CNTRYID = Country identifier
* SWEEP = Survey year (2000每2022)
Files
* PseudoPanel_SES_Trisection.dta ↙ Main dataset (Trisection SES)
* PseudoPanel_SES_Quantile.dta ↙ Robustness check A (Quantile SES)
* PseudoPanel_SES_ParentEduOnly.dta ↙ Robustness check B (Parental education only)
Notes
* These datasets are aggregated pseudo-panels at the country level. No individual-level student data are included.
* All other variables included in the dataset were used for testing purposes only and can be ignored. 
* They are designed for use with the Latent Growth Curve Modelling (LGCM) with PISA Data.do syntax file (see syntax README).

Do File name: Latent Growth Curve Modeling (LGCM) with PISA Data.do
Date: Aug 2025
Software: StataNow 18.5
Purpose
This do-file contains the syntax used for latent growth curve modeling (LGCM) analyses of aggregated PISA datasets. It prepares the dataset, specifies unconditional and conditional growth models, and estimates the impact of school closure policies on student performance.
Data
* The analysis is based on restricted-use PISA (2003每2022) data.
* Raw individual-level microdata are NOT included here due to OECD/IEA copyright restrictions. Users must obtain original data from the official sources from the OECD PISA data portal.
Variables 
﹛﹛﹛Full variable definitions can be found in the Data dictionary section at the beginning of the DO file.
Usage
1. Load the prepared dataset (PISA Country-Level Mean Achievement Pseudo-Panel Datasets for Latent Growth Curve Modelling).
2. Run the DO file in StataNow 18.5.
3. The script estimates unconditional and conditional LGCMs, with variations including quadratic time effects and interaction terms.
4. Models are fit separately by subject (Math, Reading, Science).

Datasets: PIRLS LGCM Datasets
Description
This dataset contains three versions of country-level pseudo-panel data constructed from the Progress in International Reading Literacy Study (PIRLS) sweeps between 2001 and 2021. The datasets were created to enable Latent Growth Curve Modelling (LGCM) at the group level by aggregating individual-level PIRLS data.
The central focus is the operationalisation of Socioeconomic Status (SES), a key variable in the analysis. Since SES is not directly available in a harmonised format across PIRLS cycles, we constructed three versions of SES groupings for robustness checks:
1. Trisection SES (Primary dataset)
o SES constructed using Principal Component Analysis (PCA) of parents＊ highest years of schooling and International Socio-Economic Index of Occupational Status (ISEI).
o Categorised into Low / Medium / High groups by dividing the SES range into three equal sections (33rd and 67th percentiles).
o Used as the main dataset for LGCM.
2. Quantile SES (Robustness check A)
o Same PCA-based SES variable (parental education + ISEI).
o Categorised into Low / Medium / High groups based on sample quantiles, ensuring each group contains ~one third of observations.
o Used as a robustness check against classification thresholds.
3. Parental Education Only SES (Robustness check B)
o SES is constructed solely from the average of the highest years of schooling of pupils＊ parents.
o Provides a robustness check on the role of occupational status.
Variables
* Means = Country-level mean reading achievement
* Time_Num = Numeric time variable for growth (e.g., 1＃5 corresponding to sweeps)
* Time_Num_Quadratic = Squared term of Time_Num
* SchoolClosure = Days of school closure (constructed for 2021 cycle)
* teacher_students_ratio = Teacher-to-student ratio (country-level)
* Age_of_SchoolTracking = Country-level tracking age
* Sex = Student gender indicator (0=male, 1=female)
* IMMIG = Immigration background (0=native, 1=first gen, 2=second gen)
* SES = Socioeconomic Status (categorised differently across the three datasets)
* CNTRYID = Country identifier
* SWEEP = Survey year (2001每2021)
* PIRLS = PIRLS-cycle indicator
Files
* PseudoPanel_PIRLS_SES_Trisection.dta ↙ Main dataset (Trisection SES)
* PseudoPanel_PIRLS_SES_Quantile.dta ↙ Robustness check A (Quantile SES)
* PseudoPanel_PIRLS_SES_ParentEduOnly.dta ↙ Robustness check B (Parental education only)
Notes
* These datasets are aggregated pseudo-panels at the country level. No individual-level student data are included.
* All other variables included in the dataset were used for testing purposes only and can be ignored.
* They are designed for use with the Latent Growth Curve Modelling (LGCM) with PIRLS Data.do syntax file (see syntax README).

Do File name: Latent Growth Curve Modeling (LGCM) with PIRLS Data.do
Date: Aug 2025
Software: StataNow 18.5
Purpose
This do-file contains the syntax used for latent growth curve modeling (LGCM) analyses of aggregated PIRLS datasets. It prepares the dataset, specifies unconditional and conditional growth models, and estimates the impact of school closure policies on student performance.
Data
* The analysis is based on restricted-use PIRLS (2001每2021) data.
* Raw individual-level microdata are NOT included here due to IEA copyright restrictions. Users must obtain original data from the official IEA PIRLS data portal.
Variables
Full variable definitions can be found in the Data dictionary section at the beginning of the DO file. 
Usage
1. Load the prepared PIRLS dataset (PIRLS Country-Level Mean Achievement Pseudo-Panel Datasets for Latent Growth Curve Modelling).
2. Run the DO file in StataNow 18.5.
3. The script estimates unconditional and conditional LGCMs, with variations including quadratic time effects, school closure effects, and subgroup interactions.

Do File name: PIRLS Data Robust Check PIRLS Waves.do
Date: Aug 2025
Software: StataNow 18.5
Purpose
This do-file contains the syntax for latent growth curve modeling (LGCM) analyses of aggregated PIRLS datasets.
It prepares the dataset, specifies unconditional and conditional growth models, and evaluates the impact of school closure policies on student performance.
A central focus is on robustness checks: the models are re-estimated separately for different PIRLS Waves (Wave 1, Wave 2, Wave 3) to test the stability of results across cohorts of participating countries.
Data
* The analysis is based on restricted-use PIRLS (2001每2021) data.
* Raw individual-level microdata are not included here due to IEA copyright restrictions.
* Users must obtain the official PIRLS data from the IEA data portal.
Variables: Full definitions can be found in the Data dictionary section at the beginning of the DO file.
Usage
1. Load the prepared PIRLS dataset: ※PseudoPanel_PIRLS_SES_Trisection.dta§.
2. Run the do-file in StataNow 18.5.
3. The script will: Perform robustness checks by PIRLS Waves (Control Waves fixed effect, Waves 1&2 only, Wave 1 only).

Dataset: PISA Synthetic Control Analysis Dataset
Description: These datasets provide country-level panel data from PISA sweeps (2000每2022), designed for constructing and visualising parallel trends to evaluate the impact of COVID-19 distance learning and remediation policies using Synthetic Control methods.
Each subject (Mathematics, Reading, Science) is provided as a separate syntax to be analyzed with the PISA Synthetic Control Analysis Dataset.
Key features:
* Outcome: country-level mean achievement by subject, sweep and country.
* Treatments: policy-specific dummies that proxy COVID-19 education responses.
* Covariates: teacher每student ratio, age of school tracking, and school closure days.
Variables
* Means 〞 Country-level mean student achievement (subject-specific).
* CNTRYID 〞 Country identifier.
* Time 〞 PISA sweep year (2000每2022).
* Subject 〞 Subject indicator (1=Math, 2=Reading, 3=Science).
* teacher_students_ratio 〞 Teacher每student ratio.
* Age_of_SchoolTracking 〞 Age of school tracking (average).
* SchoolClosure 〞 Number of school closure days during COVID-19.
* weight 〞 Placeholder for donor weights; can be replaced with post-estimation weights.
Policy treatment dummies:
* DiD_IIT 〞 Increasing Instructional Time
* DiD_TPFST 〞 Tutoring Programs / Financial Support for Tutoring
* DiD_DIDE 〞 Digital Devices for Distance Learning
* DiD_IITR 〞 IT Infrastructure and Technological Resources
* DiD_PEMS 〞 Print Educational Materials (for pupils without Internet access)
* DiD_PCS 〞 Psychological Counselling Services
* DiD_SEN 〞 Support for Special Educational Needs
* DiD_SGSP 〞 Support & Guidance for Pupils＊ Parents
* DiD_TVRH 〞 TV and Radio Help
Syntax Files
* Synthetic Control Analysis of COVID-19 Distance Learning and Remediation Policies with PISA Mathematics Data.do
* Synthetic Control Analysis of COVID-19 Distance Learning and Remediation Policies with PISA Reading Data.do
* Synthetic Control Analysis of COVID-19 Distance Learning and Remediation Policies with PISA Science Data.do
Usage
1. Load the dataset for the desired subject (PISA Synthetic Control Analysis Dataset).
2. Run the corresponding command set (see above Syntax Files).
3. For each policy:
o Use Means as the outcome, CNTRYID as the panel unit, and Time as the time variable.
o Select the appropriate treatment dummy (e.g., DiD_TPFST).
o Include covariates (teacher_students_ratio, Age_of_SchoolTracking, SchoolClosure).
o Specify inference with vce(placebo) or vce(jackknife).
o Export and inspect e(omega) (country weights) and e(beta) (policy effect).
4. Use twoway line plots for visualization of treatment vs. control trajectories.
Notes
* These are aggregated, country-level datasets; no microdata are included.
* Policy dummies are based on documented national measures during the pandemic.

Do File Name: Synthetic Control Analysis of COVID-19 Distance Learning and Remediation Policies with PISA Mathematics / Reading/ Science Data. do
Overview
This repository provides three Stata .do syntax files that use PISA data (2000每2022) to construct and visualize parallel trends for COVID-19 distance learning and remediation policies across Mathematics, Reading, and Science, using a Synthetic Control framework to assess the pre-treatment comparability of treated and donor countries.
Syntax Files
1. Synthetic Control Analysis of COVID-19 Distance Learning and Remediation Policies with PISA Mathematics Data.do
o Uses Means (math scores) as the outcome variable.
2. Synthetic Control Analysis of COVID-19 Distance Learning and Remediation Policies with PISA Reading Data.do
o Uses Means (reading scores) as the outcome variable.
3. Synthetic Control Analysis of COVID-19 Distance Learning and Remediation Policies with PISA Science Data.do
o Uses Means (science scores) as the outcome variable.
Analysis Workflow
1. Load dataset: Open the ※PISA Synthetic Control Analysis Dataset§.
2. Run syntax: Select the subject-specific.do file.
3. Estimate effects:
o Use ※sdid§ to run Synthetic Control models.
o Include covariates (teacher_students_ratio, Age_of_SchoolTracking, SchoolClosure).
o Apply inference (vce(placebo) or vce(jackknife)).
4. Inspect results:
o matlist e(omega) ↙ country weights.
o matlist e(beta) ↙ estimated policy effects.
5. Visualize trajectories: Use twoway line plots to compare treated vs. control countries.
Dataset: PIRLS Synthetic Control Analysis Dataset
Description:
These datasets provide country-level panel data from PIRLS sweeps (2001每2021), designed for constructing and visualising parallel trends to evaluate the impact of COVID-19 distance learning and remediation policies using Synthetic Control methods.
Each subject (Reading literacy at Grade 4) is provided as a separate syntax to be analysed with the PIRLS Synthetic Control Analysis Dataset.
Key features
* Outcome: Country-level mean achievement in Reading (Grade 4).
* Treatments: Policy-specific dummies that proxy COVID-19 education responses.
* Covariates: Teacher每student ratio, age of school tracking, and school closure days.
Variables
* Means 〞 Country-level mean student achievement (Reading).
* CNTRYID 〞 Country identifier.
* Time 〞 PIRLS sweep year (2001, 2006, 2011, 2016, 2021).
* teacher_students_ratio 〞 Teacher每student ratio.
* Age_of_SchoolTracking 〞 Age of school tracking (average).
* SchoolClosure 〞 Number of school closure days during COVID-19.
* weight 〞 Placeholder for donor weights; can be replaced with post-estimation weights.
Policy treatment dummies:
* DiD_IIT 〞 Increasing Instructional Time
* DiD_TPFST 〞 Tutoring Programs / Financial Support for Tutoring
* DiD_DIDE 〞 Digital Devices for Distance Learning
* DiD_IITR 〞 IT Infrastructure and Technological Resources
* DiD_PEMS 〞 Print Educational Materials (for pupils without Internet access)
* DiD_PCS 〞 Psychological Counselling Services
* DiD_SEN 〞 Support for Special Educational Needs
* DiD_SGSP 〞 Support & Guidance for Pupils＊ Parents
* DiD_TVRH 〞 TV and Radio Help
Syntax Files
* PIRLS Reading Synthetic Control command.do
Usage
1. Load the dataset for PIRLS Synthetic Control Analysis.
2. Run the corresponding command set (PIRLS Reading Synthetic Control command.do).
Notes
* These are aggregated, country-level datasets; no student microdata are included.
* Policy dummies are based on documented national measures during the pandemic.

Do File Name: PIRLS Reading Synthetic Control command.do
Overview
This repository provides Stata .do syntax files that use PIRLS data (2001每2021) to construct and visualise parallel trends for COVID-19 distance learning and remediation policies in Reading, using a Synthetic Control framework to assess the pre-treatment comparability of treated and donor countries.
Analysis Workflow
1. Load dataset: Open the ※PIRLS Synthetic Control Analysis Dataset§.
2. Run syntax: Execute the Reading.do file.

Do File Name: 
SDID_Analysis_PISA_Mathematics.do
SDID_Analysis_PISA_Reading.do
SDID_Analysis_PISA_Science.do
The files implement the same modelling logic but specify the subject-specific achievement variable as the outcome.
Overview
This repository provides three Stata .do syntax files designed to evaluate the effects of COVID-19 distance learning and remediation policies on student achievement using PISA country-level pseudo-panel data (2000每2022). The analysis applies the Synthetic Difference-in-Differences (SDID) method, with Mathematics, Reading, and Science analysed separately.
Data Access
* The analysis relies on restricted-use PISA microdata (2000每2022), which are not distributed here due to OECD copyright restrictions.
* Users must obtain original microdata from the OECD PISA Data Portal.
Model Specifications
* Main Specification (preferred):
o Uses log_SchoolClosure = log(SchoolClosure + 1)
o Rationale: Diminishing marginal effect of closure days ↙ more stable estimates.
* Robustness A (Linear):
o Replace log-transformed closure variable with raw SchoolClosure.
* Robustness B (Linear + Log):
o Include both SchoolClosure and log_SchoolClosure.
Weighting Notes
* country_weight encodes donor-pool weights by CNTRYID (from synthetic control step).
* Combined student weights are formed as:
o w_fstuwt := original_w_fstuwt * country_weight
o w_fsturwt1每w_fsturwt80 = originalw_fsturwt1每originalw_fsturwt80 * country_weight
Variables
* Means 〞 Country-level mean student achievement (subject-specific).
* Time 〞 PISA sweep year (2000每2022).
* CNTRYID 〞 Country identifier.
* Subject 〞 Subject indicator (1=Math, 2=Reading, 3=Science).
* SchoolClosure 〞 Number of school closure days.
* log_SchoolClosure 〞 Log-transformed closure days.
* teacher_students_ratio 〞 Teacher每student ratio.
* Age_of_SchoolTracking 〞 Average age of school tracking.
* Sex 〞 Student gender (0=male, 1=female).
* IMMIG 〞 Immigration background (0=native, 1=first gen, 2=second gen).
* SES 〞 Socioeconomic status (constructed grouping).
* country_weight 〞 Donor weights from synthetic control.
* w_fstuwt 〞 Combined sampling weight.
* w_fsturwt1每w_fsturwt80 〞 Replicate weights.
Policy treatment dummies (examples):
* DiD_IIT 〞 Increasing Instructional Time
* DiD_TPFST 〞 Tutoring Programs / Financial Support for Tutoring
* DiD_DIDE 〞 Digital Devices for Distance Learning
* DiD_IITR 〞 IT Infrastructure & Technological Resources
* DiD_PEMS 〞 Print Educational Materials
* DiD_PCS 〞 Psychological Counselling Services
* DiD_SEN 〞 Support for Special Educational Needs
* DiD_SGSP 〞 Support & Guidance for Pupils＊ Parents
* DiD_TVRH 〞 TV and Radio Help
Usage
1. Prepare the student-level dataset from the OECD PISA Data Portal. Data preparation requires merging datasets across sweeps (2000每2022). Variable names differ across sweeps; consult the corresponding study＊s "Methodology 〞 Variables" section for harmonisation details.
2. Load dataset into StataNow 18.5.
3. Run one of the .do files depending on subject domain.

Do File Names: DID_Analysis_PIRLS_Reading.do
The file implements the same modelling logic as the PISA scripts but is adapted to PIRLS student-level data and applies reading achievement (plausible values) as the outcome.
Overview
This repository provides a Stata .do syntax file designed to evaluate the effects of COVID-19 distance learning and remediation policies on student reading achievement using PIRLS student-level microdata (2001每2021).
The analysis applies a Difference-in-Differences (DiD) framework with country donor-pool weights and replicate jackknife weights (JR1每JR250), following IEA PIRLS weighting methodology.
Data Access
* The analysis relies on restricted-use PIRLS microdata (2001每2021), which are not distributed here due to IEA copyright restrictions.
* Users must obtain original PIRLS datasets from the IEA Data Repository.
* Data preparation requires merging datasets across cycles (2001, 2006, 2011, 2016, 2021).
* Variable names differ across cycles; consult each PIRLS cycle＊s User Guide 〞 Variables for harmonisation details.
Model Specifications
* Main Specification (preferred):
o log_SchoolClosure = log(SchoolClosure + 1)
o Rationale: Captures diminishing marginal effects of additional closure days ↙ improves stability of estimates.
* Robustness A (Linear):
o Replace log-transformed closure variable with raw SchoolClosure.
* Robustness B (Linear + Log):
o Include both SchoolClosure and log_SchoolClosure.
Weighting Notes
* Student weights:
o TOTWGT is the PIRLS student weight (primary analysis).
o SCHWGT = school weight; MATWGT = teacher weight (not applied in this script).
* Replicate jackknife weights:
o PIRLS provides JKZONE and JKREP variables.
o Replicate weights JR1每JR250 are constructed following IEA jackknife methodology:
* If JKZONE == i:
* JRi = 2*TOTWGT*JKREP
* JR(i+125) = 2*TOTWGT*(1-JKREP)
* Country donor-pool weights:
o country_weight encodes donor weights by CNTRYID.
o Applied multiplicatively to student weights (TOTWGT) and replicate weights (JR1每JR250).
o Observations with zero combined weight are dropped.
Variables
* pv1read每pv5read 〞 PIRLS plausible values for reading achievement
* pvread 〞 Imputed dependent variable (constructed from pv1每pv5)
* Time 〞 PIRLS cycle year (2001, 2006, 2011, 2016, 2021)
* CNTRYID 〞 Country identifier
* SchoolClosure 〞 Number of school closure days during COVID-19
* log_SchoolClosure 〞 Log-transformed school closure days
* Gender 〞 Student gender (0=female, 1=male)
* IMMIG 〞 Immigration background (0=native, 1=first gen, 2=second gen)
* SES_PCA 〞 Socioeconomic status (PCA index or grouped measure)
* Language_at_home 〞 Test language spoken at home indicator
* Age 〞 Student age at test
* TOTWGT 〞 PIRLS student weight
* JR1每JR250 〞 Replicate jackknife weights (adjusted by country_weight)
* country_weight 〞 Donor weights from synthetic control step
* WGT 〞 Final combined weight = TOTWGT ℅ country_weight
Policy treatment dummies (examples):
* DiD_IIT 〞 Increasing Instructional Time
* DiD_DIDE 〞 Digital Devices for Distance Learning
* DiD_IITR 〞 IT Infrastructure & Technological Resources
* DiD_PEMS 〞 Print Educational Materials
* DiD_PCS 〞 Psychological Counselling Services
* DiD_SEN 〞 Support for Special Educational Needs
* DiD_SGSP 〞 Support & Guidance for Pupils＊ Parents
* DiD_TVRH 〞 TV and Radio Help
Usage
1. Prepare PIRLS datasets (2001每2021) from the IEA Data Repository. Harmonise variables across cycles.
2. Load the dataset into StataNow 18.5.
3. Run the provided .do file:
o DID_Analysis_PIRLS_Reading.do

Do File Name: SDID Parallel Trends Robustness Check for PISA Dataset. do
Overview
This repository provides Stata .do files for testing the parallel trends assumption in a SDiD framework. The analysis uses student-level PISA microdata (2000每2022) and evaluates whether treated and control countries followed similar pre-treatment achievement trends before COVID-19 policy implementation in 2020.
The approach follows an event-study specification, where event-time dummies capture relative achievement changes in the years leading up to and following treatment.
Data Access
* Restricted-use PISA student-level microdata (2000每2022) is required.
* Data is not included in this repository due to OECD copyright restrictions.
* Users must obtain the original data from the OECD PISA Data Portal.
* Variable names differ across cycles; consult each PIRLS cycle＊s User Guide 〞 Variables for harmonisation details.
Model Specifications
Event-Time Setup
* Event time: xianhou = SWEEP - 2020 (only for treated countries).
* Dummies: xh1每xh6 (relative years, with xh6=0 as treatment year).
* Post-treatment year (2022) is dropped for the parallel trends test.
Regression Specification
Achievement_pv = 汕1*xh1 + 汕2*xh2 + ... + 汕6*xh6
                 + Controls (Gender, SES, IMMIG, Age, Language_at_home, Grade)
                 + Year Fixed Effects
                 + Country Fixed Effects
                 + 汍
* Outcome: Student achievement (Plausible Values: Math, Reading, Science).
* Fixed Effects: Country (absorbed by absorb(CNTRYID)), Year (sweep FE).
* Standard Errors: Clustered at the country level.
* Plot: Coefficients of event-time dummies with 95% confidence intervals.
Variables
Key Variables
* SWEEP 〞 PISA test year (2000每2022).
* CNTRYID 〞 Country identifier.
* Policy Dummy 〞 Treatment indicator (e.g., IIT, TPFST, etc.).
* xianhou 〞 Relative time (years since 2020).
* xh1每xh6 〞 Event-time dummy variables.
Student-Level Controls
* pv@scie / pv@math / pv@read 〞 Plausible values for achievement.
* Gender 〞 Student gender (0=female, 1=male).
* IMMIG 〞 Immigration background (0=native, 1=first gen, 2=second gen).
* SES_PCA 〞 Socioeconomic status.
* AGE 〞 Student age.
* Language_at_home 〞 Test language spoken at home.
* Grade 〞 Grade level at time of test.
Weights
* w_fstuwt 〞 student sampling weight.
* w_fsturwt1每80 〞 Replicate weights.
Policy Treatments Tested
* IIT 〞 Increasing Instructional Time
* TPFST 〞 Tutoring Programs / Financial Support for Tutoring
* DIDE 〞 Digital Devices for Distance Learning
* IITR 〞 IT Infrastructure & Technological Resources
* PCS 〞 Psychological Counselling Services
* TVRH 〞 TV and Radio Help
* PEMS 〞 Print Educational Materials
* SEN 〞 Support for Special Educational Needs
* SGSP 〞 Support & Guidance for Pupils＊ Parents
Usage
1. Prepare dataset
o Merge PISA microdata across sweeps (2000每2022).
o Harmonize variable names across years.
o Construct policy dummies at the country level.
2. Run analysis
o Open Stata (version 18.5 recommended).
o Load prepared dataset.
o Run one .do file corresponding to the policy of interest.
3. Interpret results
o Pre-treatment coefficients (xh1每xh5) ＞ 0 ↙ supports parallel trends.

Do File Name: SDID Parallel Trends Robustness Check for PIRLS Dataset.do
Overview
This repository provides Stata .do files for testing the parallel trends assumption in a Synthetic Difference-in-Differences (SDiD) framework using student-level PIRLS microdata (2001每2021).
The analysis evaluates whether treated and control countries followed similar pre-treatment achievement trends before COVID-19 policy implementation in 2020.
The approach follows an event-study specification, where event-time dummies capture relative achievement changes in the years leading up to and following treatment.
Data Access
* Restricted-use PIRLS student-level microdata (2001每2021) is required.
* Data is not included in this repository due to IEA copyright restrictions.
* Users must obtain the original data from the IEA PIRLS Data Portal.
* Variable names differ across cycles; consult each PIRLS cycle＊s User Guide for harmonisation details.
Model Specifications
Event-Time Setup
* Event time: xianhou = SWEEP - 2020 (only for treated countries).
* Dummies: xh1每xh6 (relative years, with xh6=0 as treatment year).
* Post-treatment year (2021) is dropped for the parallel trends test.
Regression Specification
Achievement_pv = 汕1*xh1 + 汕2*xh2 + ... + 汕6*xh6
                 + Controls (Gender, SES, IMMIG, Age, Language_at_home, Grade)
                 + Year Fixed Effects
                 + Country Fixed Effects
                 + 汍
* Outcome: Student achievement (plausible values for Reading).
* Fixed Effects: Country (absorbed by absorb(CNTRYID)), Year (sweep FE).
* Standard Errors: Clustered at the country level.
* Plot: Coefficients of event-time dummies with 95% confidence intervals.
Variables
Key Variables
* SWEEP 〞 PIRLS test year (2001, 2006, 2011, 2016, 2021).
* CNTRYID 〞 Country identifier.
* Policy Dummy 〞 Treatment indicator (e.g., IIT, TPFST, etc.).
* xianhou 〞 Relative time (years since 2020).
* xh1每xh6 〞 Event-time dummy variables.
Student-Level Controls
* pv@read 〞 Plausible values for Reading achievement (pv1read每pv10read).
* Gender 〞 Student gender (0=female, 1=male).
* IMMIG 〞 Immigration background (0=native, 1=first gen, 2=second gen).
* SES_PCA 〞 Socioeconomic status (PCA-based or PIRLS SES index).
* AGE 〞 Student age.
* Language_at_home 〞 Test language spoken at home.
* Grade 〞 Grade level at time of test.
Policy Treatments Tested
* IIT 〞 Increasing Instructional Time
* DIDE 〞 Digital Devices for Distance Learning
* IITR 〞 IT Infrastructure & Technological Resources
* PCS 〞 Psychological Counselling Services
* TVRH 〞 TV and Radio Help
* PEMS 〞 Print Educational Materials
* SEN 〞 Support for Special Educational Needs
* SGSP 〞 Support & Guidance for Pupils＊ Parents
Usage
1. Prepare dataset
o Merge PIRLS microdata across sweeps (2001每2021).
o Harmonize variable names across years.
o Construct policy dummies at the country level.
2. Run analysis
o Open Stata (version 18.5 recommended).
o Load prepared dataset.
o Run one .do file corresponding to the policy of interest.
3. Interpret results
o Pre-treatment coefficients (xh1每xh5) ＞ 0 ↙ supports parallel trends.