READ ME File For 'PhD Thesis datasets' Dataset DOI: 10.5258/SOTON/D3748 ReadMe Author: MARK TAYLOR, University of Southampton 0009-0008-6983-6128 This dataset supports the thesis entitled "Investigating spatio-temporal patterns in subsurface chlorophyll using biogeochemical-Argo floats and novel statistical methods" AWARDED BY: University of Southampton DATE OF AWARD: 2025 Licence: CC BY Related projects/Funders: Funded by NE/S007210/1 -------------------- DATA & FILE OVERVIEW -------------------- This dataset contains three .Rda files, each containing data relevant to specific chapters in the thesis: 1) data_chl_phy_zoo_ssh_prep.Rda A .Rda file containing a data frame with information on subsurface chlorophyll maxima properties and environmental covariates. The data frame has 122373 rows and 10 variables. Observations with NAs recorded for the SCM_depth and SCM_intensity variables were used for prediction (116726 rows). 2) all_profs_chl_bbp_covar.Rda A .Rda file contains 3 data frames about 1323 oceanographic profiles, one describing chlorophyll concentration, one describing particle backscatter and one describing the covariates and profile information for each profile. 3) chl_temp_profiles.Rda A .Rda data file containing three objects describing oceanographic profiling data: (1) a data frame describing the location, time and world meteorological organisation number of 102867 Argo float profiles ; (2) a matrix of 102867 chlorophyll profiles on a 5m regular grid from 5m-250m; (3) a matrix of 102867 temperature profiles on a 5m regular grid from 5m-250m. If data was derived from another source, list source: profiling data was obtained from the BGC-argo float data repository (https://biogeochemical-argo.org/). Zooplankton data was taken from “Global ocean low and mid trophic levels biomass content hindcast” Copernicus product (https://doi.org/10.48670/moi-00020). Sea surface height data was taken from “Global Ocean Gridded L 4 Sea Surface Heights And Derived Variables” product (https://doi.org/10.48670/moi-00148) -------------------------- METHODOLOGICAL INFORMATION -------------------------- Description of methods used for collection/generation of data: Methods for processing the data: 1) data_chl_phy_zoo_ssh_prep.Rda Raw chlorophyll profiles were downloaded through the ArgoFloats package in R, and the maximum concentration and its associated depth were identified. 2) all_profs_chl_bbp_covar.Rda Raw oceanographic profiles were downloaded through the ArgoFloats package in R. The euphotic depth, mixed layer depth and nitracline depth were also calculated from profiles of light, potential density and nitrate respectively. 3) chl_temp_profiles.Rda Raw oceanographic profiles were downloaded through the ArgoFloats package in R, which retrieves profiling data from the Argo repositories. Profiles were selected which had coincident temperature and chlorophyll measurements, and these were regridded on to a regular 5m grid from 5m-250m. Software- or Instrument-specific information needed to interpret the data, including software and hardware version numbers: R version >= 4.0.0. Standards and calibration information, if appropriate: The Argo data management team perform data correction prior to its release. Describe any quality-assurance procedures performed on the data: These data had already been checked and given positive quality control flags by the Argo data management team. -------------------------- DATA-SPECIFIC INFORMATION -------------------------- 1) data_chl_phy_zoo_ssh_prep.Rda Number of variables: 10 Number of cases/rows: 122373 Variable list, defining any abbreviations, units of measure, codes or symbols used: time, month, longitude, latitude, SCM_depth (the depth of the peak chlorophyll concentration on a given profile, given in metres), SCM_intensity (the maximum chlorophyll concentration on a profile, measured in mg per metre cubed), MLD (mixed layer depth, measured in metres), zeu (euphotic depth, measured in metres), zoo_all (depth-integrated mass of zooplankton carbon per metre squared) Missing data codes: NA Date that the file was created: Sep, 2024 2) all_profs_chl_bbp_covar.Rda Both of the all_profs_chl and all_profs_bbp data frames have a row for each biogeochemical observation, with variable names of: .obs (profile number, as numbered by me), .index (depth), .value (measurement), floatID (Argo float ID number), profile (profile ID number, according to WMO). The all_profs_covar data frame has 1323 observations of the following 16 variables: .obs (self-defined profile number), floatID (WMO-defined ID number), profile_WMO (WMO number for individual profiles), lon (longitude), lat (latitude), time, ncline (nitracline depth), zeu_old (euphotic depth estimated through one method), zeu_true (euphotic depth estimated through a better approach), mld (mixed layer depth), nsteepness (gradient of nitracline), nitrate (coincident nitrate profile), PAR (coincident profile of photosynethetic available radiation), density (coincident profile of potential density), month, season. Missing data codes: NA Date that the file was created: July, 2024 3) chl_temp_profiles.Rda For chl_array and temp_array matrices, dimensions are 50 depths x 102867 profiles For prof_info_df, 102867 rows of 5 variables, called profID (the WMO ID number for each profile), time, lon (longitude), lat (latitude), floatID (float WMO ID) Missing data codes: NA Date that the file was created: March, 2025