The University of Southampton
University of Southampton Institutional Repository

Improving disease misclassification and prevalence estimates by linking primary and secondary care electronic health records: an illustration from arthritis research

Improving disease misclassification and prevalence estimates by linking primary and secondary care electronic health records: an illustration from arthritis research
Improving disease misclassification and prevalence estimates by linking primary and secondary care electronic health records: an illustration from arthritis research

Prevalence estimates using primary care data health identify cases via code lists. Validation studies can discover and exclude false positives, but it is often difficult or impossible to find false negatives. This study aimed, using the example of psoriatic arthritis (PsA), to examine the extent of and adjust for misclassification by linking primary care records with text-mined outpatient letters from a North-West regional hospital (2014-2019). 245 cases of PsA were identified among 188,286 adults registered with primary care, giving an observed prevalence of 0.13% [95%CI 0.11%-0.15%]. Among a subgroup of 7,532 primary care patients attending the hospital rheumatology clinic, 202 had a primary care PsA code: 188 were confirmed as true PsA, while 14 were false positives. Primary care codes failed to identify 196 hospital-diagnosed PsA cases, leading to a more than two-fold underestimation. The adjusted prevalence, accounting for misclassification, was 0.25% [95% CI 0.21%-0.28%]. Linking primary care with hospital records identified false positives and negatives, enabling correction of prevalence estimates. This highlights the value of text-mining hospital letters to replace the national absence of coded secondary care diagnosis data from outpatient departments, and the importance of considering the impact of false negatives.

0002-9262
Yimer, Belay Birlie
35af844b-99da-44ae-959a-edfe713eb3c3
Zhang, Fangyuan
3c788ffb-1783-4484-916f-a01bcad4dba7
Humphreys, Jenny
441d1b80-020b-489e-bd5e-06287946b3e1
Lunt, Mark
d8ac296a-c589-4d9d-b4c4-9f3f219b772c
Jani, Meghna
cb236cda-9d7a-4a78-a79f-fdf9b7d77022
McBeth, John
98012716-66ba-480b-9e43-ac53b51dce61
Dixon, William G
8fcb2256-4094-4f58-9777-4248ad245166
Yimer, Belay Birlie
35af844b-99da-44ae-959a-edfe713eb3c3
Zhang, Fangyuan
3c788ffb-1783-4484-916f-a01bcad4dba7
Humphreys, Jenny
441d1b80-020b-489e-bd5e-06287946b3e1
Lunt, Mark
d8ac296a-c589-4d9d-b4c4-9f3f219b772c
Jani, Meghna
cb236cda-9d7a-4a78-a79f-fdf9b7d77022
McBeth, John
98012716-66ba-480b-9e43-ac53b51dce61
Dixon, William G
8fcb2256-4094-4f58-9777-4248ad245166

Yimer, Belay Birlie, Zhang, Fangyuan, Humphreys, Jenny, Lunt, Mark, Jani, Meghna, McBeth, John and Dixon, William G (2025) Improving disease misclassification and prevalence estimates by linking primary and secondary care electronic health records: an illustration from arthritis research. American Journal of Epidemiology. (doi:10.1093/aje/kwaf206).

Record type: Article

Abstract

Prevalence estimates using primary care data health identify cases via code lists. Validation studies can discover and exclude false positives, but it is often difficult or impossible to find false negatives. This study aimed, using the example of psoriatic arthritis (PsA), to examine the extent of and adjust for misclassification by linking primary care records with text-mined outpatient letters from a North-West regional hospital (2014-2019). 245 cases of PsA were identified among 188,286 adults registered with primary care, giving an observed prevalence of 0.13% [95%CI 0.11%-0.15%]. Among a subgroup of 7,532 primary care patients attending the hospital rheumatology clinic, 202 had a primary care PsA code: 188 were confirmed as true PsA, while 14 were false positives. Primary care codes failed to identify 196 hospital-diagnosed PsA cases, leading to a more than two-fold underestimation. The adjusted prevalence, accounting for misclassification, was 0.25% [95% CI 0.21%-0.28%]. Linking primary care with hospital records identified false positives and negatives, enabling correction of prevalence estimates. This highlights the value of text-mining hospital letters to replace the national absence of coded secondary care diagnosis data from outpatient departments, and the importance of considering the impact of false negatives.

Text
kwaf206 - Version of Record
Available under License Creative Commons Attribution.
Download (678kB)

More information

Accepted/In Press date: 31 August 2025
e-pub ahead of print date: 17 September 2025
Published date: 17 September 2025
Additional Information: © The Author(s) 2025. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health.

Identifiers

Local EPrints ID: 506872
URI: http://eprints.soton.ac.uk/id/eprint/506872
ISSN: 0002-9262
PURE UUID: c0da9cbb-7b21-46af-966a-5f1814cb4367
ORCID for John McBeth: ORCID iD orcid.org/0000-0001-7047-2183

Catalogue record

Date deposited: 19 Nov 2025 17:42
Last modified: 20 Nov 2025 03:07

Export record

Altmetrics

Contributors

Author: Belay Birlie Yimer
Author: Fangyuan Zhang
Author: Jenny Humphreys
Author: Mark Lunt
Author: Meghna Jani
Author: John McBeth ORCID iD
Author: William G Dixon

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×