READ ME File For 'Physics-Informed Neural Networks for Passive Scalar Emission and Transport'

Dataset DOI: 10.5258/SOTON/D3810

Date that the file was created: 01/2026

-------------------
GENERAL INFORMATION
-------------------

ReadMe Author: Joshua Ian Rawden, University of Southampton, 0009-0002-2755-4452

Date of data collection: 10/2024

Related projects:
The primary author acknowledges funding from the Antony Wright Scholarship, provided by the Department of Aeronautics and Astronautics at the University of Southampton.

--------------------------
SHARING/ACCESS INFORMATION
-------------------------- 

Licenses/restrictions placed on the data, or limitations of reuse: Open access, License CC BY

Recommended citation for the data: 

This dataset supports the publication:
AUTHORS: J. I. Rawden, C. Vanderwel, S. Symon
TITLE: Physics-Informed Neural Networks for Passive Scalar Emission and Transport
JOURNAL: Physical Review Fluids
PAPER DOI: 10.1103/wjlv-sy4j

--------------------
DATA & FILE OVERVIEW
--------------------

This dataset is composed of 3 .zip folders:

- 'CodeAndData.zip'

	- 'Code' folder
		- Contains PINN Python training programs used for training models on the 3 different flow case data given by the file name e.g. 'surface', 'upstream', 'wake', with different independent variables. For example, files ending with '_data' vary the training data grid density between 1 to 10, files ending with '_pr' vary the passive scalar molecular Prandtl/Schmidt number between 1 to 10, and files ending with '_temp' vary the passive scalar training data grid density while keeping the velocity data grid fixed and sparse. Files ending in '_g' or '_source_pr' correspond to the cropped/source domain models, which focused on identification of the passive scalar source properties using the training data by varying the passive scalar training grid density or the Prandtl/Schmidt number respectively. Files ending in '_net' were used in the network size sensitivity study presented in the appendix. Finally, 'vel_only.py' and 'vel_only_inlet.py' are programs for training PINNs with only velocity training data, with and without an inlet boundary condition respectively. These programs are commented and give additional details about the PINN architecture and training procedure. The Jupyter notebook files 'plotting_script.ipynb' and 'PINN_functions.ipynb' are used to produce field reconstructions  for a series of models in sequence and were used to create the PINN reconstruction plots in the article. Required packages for PINN model training can be found in 'requirements.txt'. Additional packages may be required for producing reconstructed fields, such as 'matplotlib' and 'Jupyter' if using the provided scripts.

	- 'Datasets' folder
		- Mostly contains time averaged, raw text CFD training data produced in OpenFOAM. The flow case is given by the file name prefix, e.g. 'surface', 'upstream', 'wake', followed by the molecular Prandtl/Schmidt number, e.g. 'Surface10Fields.txt' corresponds to data for the surface emission flow case at a Prandtl/Schmidt number of 10. Lack of a Prandtl/Schmidt number identifier means it was set to 1. Files ending with '_g' provide the divergence of the turbulence fluxes, the fields for which were calculated as a postprocessing step after generation of the original datasets. For guidance on how to unpack the '.txt' training data files into the individual fields, please refer to the example programs in the 'Code' folder above. In addition to the CFD training data, this folder contains supplementary data for assisting with PINN training, for example, 'collocation_points.txt' provides a custom collocation point distribution for PINN training, which clusters points around the flow centreline and near the cylinder surface. The file 'shedding.txt' gives data that can be used for validation of the simulation accuracy by computing the Von Karman vortex shedding frequency.


- 'FullDomainModels.zip'
	- Contains models trained on the full PINN domain. The model name file structure is as follows: '{case}_{type}_{variable}-{epochs}.pt'. For example, 'surface_data_01-35798.pt' corresponds to a model trained on the surface emission flow case, used in the training data grid density investigation, where this model in particular has a data grid density h=1, and was trained for 35798 epochs. The same approach can be used for the Prandtl/Schmidt number ('_pr_'), variable scalar/fixed velocity grid ('_t_') and network size ('_net_') investigations.


- 'CroppedDomainModels.zip'
	- Contains models trained on the cropped/source PINN domain. The model file name structure is the same as above. Models with '_g_sparse_c_' correspond to models trained with a passive scalar training data grid density given by the variable number, while models with '_g_dense_c_' were trained using a passive scalar grid of the density variable multiplied by 10. Models with '_source_pr_' correspond to models trained at a Prandtl/Schmidt number given by the variable number.

--------------------------
METHODOLOGICAL INFORMATION
--------------------------

For a description of methods used for the generation of training data and the training of models, please refer to the article linked to this dataset.