######################################################################################
# R source code to perform analyses in Chapter 4 of the PhD thesis:                  #
# "Stochastic Modelling and Projection of Age-Specific Fertility Rates"              #
# by Joanne Ellison.                                                                 #
# Please email J.Ellison@soton.ac.uk with any queries.                               #
######################################################################################

## 1. Set-up
#To run this code you will need R version 3.4.0 or above and be able to install
#packages, in particular rstan, which is required to implement the Hamiltonian
#Monte Carlo methodology to fit the models. This will probably require you to also 
#install the package Rtools, explained in the rstan installation instructions that
#can be found at "https://github.com/stan-dev/rstan/wiki".

#This code builds on the modelling in Chapter 3, so begin by following Steps 1 and 3 in
#"chap3/source_file.r" to process the UKHLS and HDI data and obtain the mean Q imputations.

#Install required packages (only need to do this when running R code for first time)
source("chap4/scripts/install_packages.r")

#Data processing
#Reads in and processes the ONS parity-specific fertility rates from
#"https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/adhocs/11482fertilityratesbyparity1934to2018englandandwales".
#Saves rates, births, exposures and proportions for ages 15-44, cohorts 1945-2003 and maximum year 2018
#in "chap4/data" as "ONS2018_allc.RData", and for cohorts 1945-1992 and maximum year 2008 as "ONS2018_resc.RData".
source("chap4/scripts/process_ons.r")

## 2. Exploratory data analysis
#Generates and saves Figures 4.1-4.3 in "chap4/plots" (using each version of the ONS data for Figure 4.1).
source("chap4/scripts/eda.r")

## 3. Covariate models
#Fits the chosen Q|A,C covariate models from Section 4.3.2.2 for parities 0, 1 and 2, saving the Stan output 
#in "chap4/results". Also generates and saves Figures 4.4, 4.7 and 4.8 in "chap4/plots".
source("chap4/scripts/model_QAC.r")
#Fits the chosen T|A,(Q) covariate models from Section 4.3.2.3 for parities 1, 2 and 3+, saving the Stan output
#in "chap4/results". Also generates and saves Figures 4.9, 4.12 and 4.13 in "chap4/plots".
source("chap4/scripts/model_TAQ.r")

## 4. Multiple imputation of qualification
#Fits Bayesian version of imputation model and generates imputations from 10 random iterations of the output.
#Saves Stan output in "chap4/results" as "Qimp.RData", and saves imputation results in "chap4/results" as 
#"QMI.RData". Also generates and saves Figures 4.14-4.16 in "chap4/plots".
source("chap4/scripts/multiply_impute_Q.r")

## 5. Fit models
source("chap4/scripts/important_functions.r")
#Fits UKHLS-only GAMs to all parities using the mean Q imputation and saves Stan output in "chap4/results".
source("chap4/scripts/fit_UKHLS_only.r")
#Fits integrated models and saves output in "chap4/results" for parities 0, 1, 2 and 3+.
#Note that the code fits the 50/50 model by default - the 33/67, 25/75, 20/80 and 10/90 models can be fitted
#by removing the appropriate #'s in the R scripts.
#The code uses the ONS data up to 2018 by default. To perform backtesting and only use the ONS data up to
#2013, change "back" from FALSE to TRUE in line 6 (line 5 for parity 3+) of the R scripts.
#The code uses the mean Q imputation by default. To fit the models using the Q imputations generated in 
#Step 4, change "imp" from FALSE to TRUE in line 7, and then choose the imputation number in line 10.
#Note that the parity 3+ model does not include Q.
source("chap4/scripts/fit_integrated_p0.r")
source("chap4/scripts/fit_integrated_p1.r")
source("chap4/scripts/fit_integrated_p2.r")
source("chap4/scripts/fit_integrated_p3.r")

## 6. Plot results
#Processes models fitted in Step 5 using the mean Q imputation (for a given parity, the UKHLS-only model requires
#at least one integrated model to have been fitted in order to perform the marginalisation). The code also 
#assumes that the same model(s) have been fitted for each parity. The user must indicate which models have been
#fitted on line 13. Also note that some figures require certain models to have been fitted (see code).
#The code generates and saves Figures 4.17-4.28 in "chap4/plots". The code also extracts and saves the samples
#of the marginalised probabilities across the parities for each model separately in "chap4/results". 
source("chap4/scripts/plot_results.r")

## 7. Backtesting results
#Processes backtesting models fitted in Step 5 using the mean Q imputation. Requires the user to specify (in line 12)
#an integrated model that has been fitted to each parity using the ONS data up to 2018 and 2013 (i.e. with back = 
#TRUE and back = FALSE in Step 5). The code generates and saves Figures 4.29-4.32 in "chap4/plots".
source("chap4/scripts/backtesting.r")

## 8. Multiple imputation results - parity 0
#Processes models fitted in Step 5 using the mean Q imputation and the 10 imputations from Step 4. Requires the user
#to specify (in line 12) an integrated model that has been fitted to all 11 Q imputations for parity 0. The code
#generates and saves Figures 4.33-4.34 in "chap4/plots".
source("chap4/scripts/MI_results_p0.r")

## 9. Aggregate forecasts
#Uses the marginalised probability samples saved in Step 6 to generate ASFR and CFR forecasts for each model as 
#described in Section 4.5. Requires the user to specify which models have been fitted on line 12. The code generates
#and saves Figures 4.36 and 4.38 in "chap4/plots".
source("chap4/scripts/agg_forecast.r")
