A synthetic dataset for the exploration of survival and classification models: prediction of heart attack or stroke within a 10-year follow-up period
A synthetic dataset for the exploration of survival and classification models: prediction of heart attack or stroke within a 10-year follow-up period
Machine learning methodologies are becoming increasingly popular in healthcare research. This shift to integrated data science approaches necessitates professional development of the existing healthcare data analyst workforce. To enhance this smooth transition, educational resources need to be developed. Real healthcare datasets, vital for healthcare data analysis and training purposes, have many barriers, including financial, ethical, and patient confidentiality concerns. Synthetic datasets that mimic real-world complexities offer simple solutions. The presented synthetic dataset mirrors the routinely collected primary care data on heart attacks and strokes among the adult population. Training experiences using this synthetic dataset are elevated as the data incorporate many of the practical challenges encountered in routinely collected primary care systems, such as missing data, informative censoring, interactions, variable irrelevance, and noise.
By openly sharing this synthetic dataset, our goal was to contribute a transformative asset for professional training in health and social care data analysis. The dataset covers demographics, lifestyle variables, comorbidities, systolic blood pressure, hypertension treatment, family history of cardiovascular diseases, respiratory function, and experience of heart attack and/or stroke. Methods for simulating each variable are detailed to ensure a realistic representation of the patient data. This initiative aims to bridge the gap in sophisticated healthcare datasets for training, fostering professional development in the healthcare and social care research workforce.
Burns, Dan
40b9dc88-a54a-4365-b747-4456d9203146
Driessens, Corine
59335f14-4ead-4692-9969-7ed9cc1ccf08
Richardson, Kathryn
d2a7c467-ecdb-4a61-bc07-c3367aca34a6
1 November 2024
Burns, Dan
40b9dc88-a54a-4365-b747-4456d9203146
Driessens, Corine
59335f14-4ead-4692-9969-7ed9cc1ccf08
Richardson, Kathryn
d2a7c467-ecdb-4a61-bc07-c3367aca34a6
Burns, Dan, Driessens, Corine and Richardson, Kathryn
(2024)
A synthetic dataset for the exploration of survival and classification models: prediction of heart attack or stroke within a 10-year follow-up period.
NIHR Open Research.
(doi:10.3310/nihropenres.13651.1).
Abstract
Machine learning methodologies are becoming increasingly popular in healthcare research. This shift to integrated data science approaches necessitates professional development of the existing healthcare data analyst workforce. To enhance this smooth transition, educational resources need to be developed. Real healthcare datasets, vital for healthcare data analysis and training purposes, have many barriers, including financial, ethical, and patient confidentiality concerns. Synthetic datasets that mimic real-world complexities offer simple solutions. The presented synthetic dataset mirrors the routinely collected primary care data on heart attacks and strokes among the adult population. Training experiences using this synthetic dataset are elevated as the data incorporate many of the practical challenges encountered in routinely collected primary care systems, such as missing data, informative censoring, interactions, variable irrelevance, and noise.
By openly sharing this synthetic dataset, our goal was to contribute a transformative asset for professional training in health and social care data analysis. The dataset covers demographics, lifestyle variables, comorbidities, systolic blood pressure, hypertension treatment, family history of cardiovascular diseases, respiratory function, and experience of heart attack and/or stroke. Methods for simulating each variable are detailed to ensure a realistic representation of the patient data. This initiative aims to bridge the gap in sophisticated healthcare datasets for training, fostering professional development in the healthcare and social care research workforce.
Text
29d02de8-c6e6-473d-afaa-80209c93988f_13651_-_dan_burns
- Version of Record
More information
e-pub ahead of print date: 1 November 2024
Published date: 1 November 2024
Identifiers
Local EPrints ID: 501070
URI: http://eprints.soton.ac.uk/id/eprint/501070
ISSN: 2633-4402
PURE UUID: 4b71664c-6576-400a-8dec-aabd734ea27d
Catalogue record
Date deposited: 22 May 2025 16:40
Last modified: 22 Aug 2025 02:19
Export record
Altmetrics
Contributors
Author:
Dan Burns
Author:
Corine Driessens
Author:
Kathryn Richardson
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics