READ ME File For 'Resurrecting Second Harmonic Generation (SHG) Case Study Dataset' Dataset DOI: 10.5258/SOTON/PSDI0002 Date that the file was created: April, 2025 ------------------- GENERAL INFORMATION ------------------- ReadMe Author: Dr Samantha Pearman-Kanza, University of Southampton ORCID: 0000-0002-4831-9489 Dataset Authors: - Dr Stephen Gow, University of Southampton, ORCID: 0000-0003-0121-1697 - Dr Don Cruickshank, University of Southampton, ORCID: 0000-0002-0777-0855 - Dr Jack Doyle, University of Southampton, LinkedIn: http://www.linkedin.com/in/jack-doyle-175183323/ - Dr Samantha Pearman-Kanza, University of Southampton ORCID: 0000-0002-4831-9489 - Professor Jeremy G. Frey, University of Southampton, 0000-0003-0842-4302 Date of data collection: 01/06/2023-01/10/2024 Information about geographic location of data collection: Southampton Related projects: Physical Sciences Data Infrastructure (PSDI) -------------------------- SHARING/ACCESS INFORMATION -------------------------- Licenses/restrictions placed on the data, or limitations of reuse: CC-BY Recommended citation for the data: This dataset supports the PSDI Case Study: AUTHORS: Matthew Partridge, Stephen Gow, Jack Doyle, Don Cruickshank, Samantha Pearman-Kanza, Jeremy G. Frey TITLE: Resurrecting Second Harmonic Generation (SHG) Case Study REPORT TYPE: PSDI Case Study REPORT DOI: 10.5258/SOTON/PSDI0001 -------------------- DATA & FILE OVERVIEW -------------------- This dataset contains: - new_shgdatabase.sql - This script will create the required databases} - shg_dashboard.twb - This is the tableau file to visualise the data in the SHG databases created by the script above} This dataset contains supporting information for the PSDI Case Study: Resurrecting Second Harmonic Generation (SHG), conducted by the Physical Science Data Infrastructure (PSDI) Initiative www.psdi.ac.uk. ------------------------------------------------- INSTRUCTIONS FOR DATABASE SETUP AND VISUALISATION ------------------------------------------------- This section explains how to set up the SHG database on your own computer. While the steps are intended to be intuitive to follow, some knowledge of MySQL usage is advised. Guidance on MySQL can be found on the MySQL website (https://dev.mysql.com/doc/mysql-getting-started/en/). You will first need to download and install versions of MySQL (https://dev.mysql.com/doc/mysql-getting-started/en/) and MySQL Shell (https://dev.mysql.com/downloads/shell/) compatible with your operating system. Then, establish a MySQL connection to host the database – the following steps assume that this was done via the root user. You will also need to download the SQL file “new_shgdatabase.sql” (link to be added). Open a Terminal or Command Prompt window and run the following commands (note that the source statement may take up to an hour to process): -- mysqlsh -- \sql -- \connect root@localhost -- [Password for your local instance] -- \source [path of file new_shgdatabase_backup.sql] You will only need to run these commands once to instantiate the database. Note that the file path will need to be surrounded by quotation marks (“) if it contains spaces. To verify that the setup was successful, first check that the four databases described in the following section are all present. Databases can be viewed in MySQL Workbench (link) or by running the following command in MySQL: -- SHOW DATABASES To check that the databases have been populated correctly, run the following commands in MySQL or MySQL Workbench: -- USE shgdata; -- SELECT COUNT(*) FROM shgdata.data_point; This should give a result of 21,824,799. -------------------- DATABASE DESCRIPTION -------------------- The SHG data structure in fact consists of four databases, named “shgdata”, “middleware_data”, “middleware_setup” and “middleware_monitor”. The tables in the final two databases relate to the experimental design, maintenance and monitoring of the experiments which generated the SHG data; these are of limited interest for data analysis purposes, so we do not describe them in detail here. The largest database by size is “middleware_data”. This consists of three tables with ten columns each (three of which are empty) containing information on temperature and humidity across the course of the experiments. Of particular interest in each table are the columns “data_time”, recording the time and date of the observation; “data_sensorname”, recording the type of observation taken; and “data_value”, recording the temperature or humidity observed at that time. The database “shgdata” contains the majority of the output data of interest for analysis of the project. It consists of 16 tables covering the setup and outputs of each experiment, which is a combination of several experimental runs themselves made up of individual data points. The observations from the polarisation experiments are stored in the table “data_point”, with supporting information about the runs and experiments contained in the “exp_run”, “experiment” and “model_data” tables. The table “spectra_point” contains the wavelength and value of the observed spectra. ------------------------ DATA VISUALISATION SETUP ------------------------ Once the database has been built, users are free to construct any data visualisation tool as they wish. Details of the visualisations used in our work, which are closely based on those from the original project outputs, are given in the following section. We have also built a data visualisation portal for the database has been built in Tableau, a proprietary software platform. We will now describe the process of accessing this portal, while acknowledging that it will not be the best option for every use case of the dataset. To use the portal requires a Tableau account. These are available free of charge to academic researchers (link). The process of gaining a free account may be time-consuming, and can require email correspondence in addition to the form on the website to confirm that the software will be used only for non-commercial purposes. A 14-day free trial is available if short-term access is required more quickly. Once you have an account, download the file “shg_dashboard.twb” (link to be added). Open this file in Tableau and create MySQL connections giving the workbook access to both the “shgdata” and “middleware_data” databases. You will need to provide the correct username and password for the MySQL connections; due to the idiosyncrasies of the Tableau software, that this may take multiple attempts. You will then have to allow a significant amount of time for the workbook to build the resources required for the visualisations to work. Opening the workbook again after establishing the connections and building the resources should be significantly quicker, although on several occasions during testing it was required to edit the connection manually to ensure that correct credentials were accepted by the software. -------------------------- DATA VISUALISATION DETAILS -------------------------- The visualisation tool consists of four plots displaying different (but related) information on the nature and output of the experiments. The plots can be interacted with by the user to obtain additional detail; interacting with a particular experiment in the timeline plot will generate plots for the selected experiment, and interacting with a specific run in an experiment plot will generate plots for the selected run. The plot named “ExpList” and titled “Project Timeline” displays the start and end time of every experiment conducted during the project, colour coded by type of experiment. “Experiment” plots the observed signal against the input polarisation for each experiment, colour coded by output polarisation. This is only meaningful for experiments of the type “polarisation sweep” and “DOE experiment”. The plot “Run” displays the details of the data points within each experimental run. Finally, temperature and humidity data across the timespan of each experiment are shown in the plot “TempHum”, titled “Environmental conditions”.