READ ME File For 'Dataset title' Dataset DOI: 10.5258/SOTON/D3700 ReadMe Author: Navamayooran Thavanesan, University of Southampton ORCID ID 0000-0002-7127-9606 This dataset supports the thesis entitled "The application of machine learning to the multidisciplinary assessment and management of oesophageal cancer" AWARDED BY: University of Southampton DATE OF AWARD: TBA Date of data collection: 2009-2022 Information about geographic location of data collection: University Hospitals Southampton Licence: CC-BY -------------------- DATA & FILE OVERVIEW -------------------- This dataset contains: Full cohort v6 Pure.xlsx This is an excel spreadsheet tabulating the patients (de-identified) which were used to train machine learning models for treatment planning and palliative survival for the Oesophageal Cancer MDT. The work book is split across 4 tabs: Curative + palliative, Curative, New Endo and Palliative alone The first tab collates all cases which directly train models. The subsequent tab splits the cases out by treatment plan The headers are a link Id for back-tracing, followed by clinical variables: Gender Age Smoking status Performance Status Tumour location clinical T stage clinical N stage clinical M stage Treatment group Tumour histology History of Myocardial Infarction (MI) Congestive Heart Failure (CHF) Chronic Pulmonary Disease Connective Tissue Disorder Peripheral Vascular Disease Cerebrovascular Disease Dementia History of Peptic Ulcer Disease (X.PUD) Diabetes (uncomplicated) Diabetes (with complications) Leukemia Malignant Lymphoma Liver disease (mild) Liver disease (mod-severe) Hemiplegia Metastatic disease Renal failure AIDS Referring location American Society of Anaesthesiologists score Date of Diagnosis Year of Diagnosis Mortality status Date of recurrence Recurrence Pattern Date of last follow up Date last follow up was checked Date of death Date of last disease free survival Disease free survival calculated Overall survival calculated Status (dead or alive) Additional Link ID Date of Scans This data was derived from a combination of 2 previous oesophagectomy databases held and mainatained at University Hospitals Southampton as well as the historic submissions to the National Oesophagogastric Audit Database While this is version 6 of the file this simply represents successive pre-processing and clean up of the dataset as it current is. -------------------------- METHODOLOGICAL INFORMATION -------------------------- Description of methods used for collection/generation of data: Methods for processing the data: Unstructured data from electronic health records were converted to either continuous variables (Age) or categorical variables. If the data header is a comorbidity it is re-coded as Yes or No (Y/N), if it is a tumour characteristic such as location or type of tumour the data is coded based on the subcategories relevant to that variable. Software- ML models were then trained and validated using R version 4.2.2 Describe any quality-assurance procedures performed on the data: Data was back-checked using health records to fill missing data and correct any errors within historically collected data by myself. People involved with sample collection, processing, analysis and/or submission: The dataset was evolved through successive iterations by myself, Mr Ben Grace and Dr. Saqib Rahman. -------------------------- DATA-SPECIFIC INFORMATION -------------------------- Number of variables: 44 Number of cases/rows: Max 1047 Variable list, defining any abbreviations, units of measure, codes or symbols used: please see above Missing data codes: NA Specialized formats or other abbreviations used: As above Date that the file was created: This version last modified October, 2025 -------------- Notes: 1. Rename file, giving it an appropriate name and removing the word 'template'. 2. Remove [] adding in information where required. 3. Remove any sections not relevant to your dataset 4. Remove these notes before saving