The University of Southampton
University of Southampton Institutional Repository

A synthetic dataset for the exploration of survival and classification models: prediction of heart attack or stroke within a 10-year follow-up period

A synthetic dataset for the exploration of survival and classification models: prediction of heart attack or stroke within a 10-year follow-up period
A synthetic dataset for the exploration of survival and classification models: prediction of heart attack or stroke within a 10-year follow-up period
Machine learning methodologies are becoming increasingly popular in healthcare research. This shift to integrated data science approaches necessitates professional development of the existing healthcare data analyst workforce. To enhance this smooth transition, educational resources need to be developed. Real healthcare datasets, vital for healthcare data analysis and training purposes, have many barriers, including financial, ethical, and patient confidentiality concerns. Synthetic datasets that mimic real-world complexities offer simple solutions. The presented synthetic dataset mirrors the routinely collected primary care data on heart attacks and strokes among the adult population. Training experiences using this synthetic dataset are elevated as the data incorporate many of the practical challenges encountered in routinely collected primary care systems, such as missing data, informative censoring, interactions, variable irrelevance, and noise.

By openly sharing this synthetic dataset, our goal was to contribute a transformative asset for professional training in health and social care data analysis. The dataset covers demographics, lifestyle variables, comorbidities, systolic blood pressure, hypertension treatment, family history of cardiovascular diseases, respiratory function, and experience of heart attack and/or stroke. Methods for simulating each variable are detailed to ensure a realistic representation of the patient data. This initiative aims to bridge the gap in sophisticated healthcare datasets for training, fostering professional development in the healthcare and social care research workforce.
2633-4402
Burns, Dan
40b9dc88-a54a-4365-b747-4456d9203146
Driessens, Corine
59335f14-4ead-4692-9969-7ed9cc1ccf08
Richardson, Kathryn
d2a7c467-ecdb-4a61-bc07-c3367aca34a6
Burns, Dan
40b9dc88-a54a-4365-b747-4456d9203146
Driessens, Corine
59335f14-4ead-4692-9969-7ed9cc1ccf08
Richardson, Kathryn
d2a7c467-ecdb-4a61-bc07-c3367aca34a6

Burns, Dan, Driessens, Corine and Richardson, Kathryn (2024) A synthetic dataset for the exploration of survival and classification models: prediction of heart attack or stroke within a 10-year follow-up period. NIHR Open Research. (doi:10.3310/nihropenres.13651.1).

Record type: Article

Abstract

Machine learning methodologies are becoming increasingly popular in healthcare research. This shift to integrated data science approaches necessitates professional development of the existing healthcare data analyst workforce. To enhance this smooth transition, educational resources need to be developed. Real healthcare datasets, vital for healthcare data analysis and training purposes, have many barriers, including financial, ethical, and patient confidentiality concerns. Synthetic datasets that mimic real-world complexities offer simple solutions. The presented synthetic dataset mirrors the routinely collected primary care data on heart attacks and strokes among the adult population. Training experiences using this synthetic dataset are elevated as the data incorporate many of the practical challenges encountered in routinely collected primary care systems, such as missing data, informative censoring, interactions, variable irrelevance, and noise.

By openly sharing this synthetic dataset, our goal was to contribute a transformative asset for professional training in health and social care data analysis. The dataset covers demographics, lifestyle variables, comorbidities, systolic blood pressure, hypertension treatment, family history of cardiovascular diseases, respiratory function, and experience of heart attack and/or stroke. Methods for simulating each variable are detailed to ensure a realistic representation of the patient data. This initiative aims to bridge the gap in sophisticated healthcare datasets for training, fostering professional development in the healthcare and social care research workforce.

Text
29d02de8-c6e6-473d-afaa-80209c93988f_13651_-_dan_burns - Version of Record
Available under License Creative Commons Attribution.
Download (593kB)

More information

e-pub ahead of print date: 1 November 2024
Published date: 1 November 2024

Identifiers

Local EPrints ID: 501070
URI: http://eprints.soton.ac.uk/id/eprint/501070
ISSN: 2633-4402
PURE UUID: 4b71664c-6576-400a-8dec-aabd734ea27d
ORCID for Dan Burns: ORCID iD orcid.org/0000-0001-6976-1068
ORCID for Corine Driessens: ORCID iD orcid.org/0000-0003-3767-7683

Catalogue record

Date deposited: 22 May 2025 16:40
Last modified: 22 Aug 2025 02:19

Export record

Altmetrics

Contributors

Author: Dan Burns ORCID iD
Author: Corine Driessens ORCID iD
Author: Kathryn Richardson

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×