The University of Southampton
University of Southampton Institutional Repository

Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England

Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England
Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England
Background: electronic health records (EHRs) are increasingly used to investigate health inequalities across ethnic groups. While there are some studies showing that the recording of ethnicity in EHR is imperfect, there is no robust evidence on the accuracy between the ethnicity information recorded in various real-world sources and census data.

Methods and findings: we linked primary and secondary care NHS England data sources with Census 2021 data and compared individual-level agreement of ethnicity recording in General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR), Hospital Episode Statistics (HES), Ethnic Category Information Asset (ECIA), and Talking Therapies for anxiety and depression (TT) with ethnicity reported in the census. Census ethnicity is self-reported and, therefore, regarded as the most reliable population-level source of ethnicity recording. We further assessed the impact of multiple approaches to assigning a person an ethnic category. The number of people that could be linked to census from ECIA, GDPPR, HES, and TT were 47.4m, 43.5m, 47.8m, and 6.3m, respectively. Across all 4 data sources, the White British category had the highest level of agreement with census (≥96%), followed by the Bangladeshi category (≥93%). Levels of agreement for Pakistani, Indian, and Chinese categories were ≥87%, ≥83%, and ≥80% across all sources. Agreement was lower for Mixed (≤75%) and Other (≤71%) categories across all data sources. The categories with the lowest agreement were Gypsy or Irish Traveller (≤6%), Other Black (≤19%), and Any Other Ethnic Group (≤25%) categories.

Conclusions: certain ethnic categories across all data sources have high discordance with census ethnic categories. These differences may lead to biased estimates of differences in health outcomes between ethnic groups, a critical data point used when making health policy and planning decisions.
1549-1277
Razieh, Cameron
1f2cef7c-20b4-4edc-9533-c34fed0bfc13
Powell, Bethan
cf2af2f6-b0ea-45ff-8c4f-85d13f978c0c
Drummond, Rosemary
46e45193-5d6b-44d8-a3e8-cfe4c4f1b33e
Ward, Isobel L.
ddb29239-a89e-49c9-88bf-af8240604af8
Morgan, Jasper
06cc0fec-7f9e-412d-926a-ad851c69f1f4
Glickman, Myer
61b66cbc-a403-4cd6-b7b5-eb18fccae968
White, Chris
7839babc-da01-4c91-b03c-958189d2c744
Zaccardi, Francesco
8d31a980-3db1-4477-9514-c18087cf886a
Hope, Jonathan
4da0fcde-3c24-4869-b6fb-7c00f04ad4ff
Raleigh, Veena
5d55c9fd-6b9d-492b-952d-2962cd47c6d1
Akbari, Ashley
80b0f5bb-6f36-491d-9725-8fee367e03ff
Islam, Nazrul
e5345196-7479-438f-b4f6-c372d2135586
Yates, Thomas
dce0546a-5b14-41b5-b1a2-b78a9057389b
Murphy, Lisa
bf665b9a-938d-4c53-8de7-25db65766e45
Mateen, Bilal A.
b612fcd5-ca88-4c18-9af7-894fa5edcc5f
Khunti, Kamlesh
d1f24c8b-842c-4b44-a0fc-2af3ac3a458f
Nafilyan, Vahe
542d70f5-e4b4-4b4c-b9f6-5b039999f8e9
Razieh, Cameron
1f2cef7c-20b4-4edc-9533-c34fed0bfc13
Powell, Bethan
cf2af2f6-b0ea-45ff-8c4f-85d13f978c0c
Drummond, Rosemary
46e45193-5d6b-44d8-a3e8-cfe4c4f1b33e
Ward, Isobel L.
ddb29239-a89e-49c9-88bf-af8240604af8
Morgan, Jasper
06cc0fec-7f9e-412d-926a-ad851c69f1f4
Glickman, Myer
61b66cbc-a403-4cd6-b7b5-eb18fccae968
White, Chris
7839babc-da01-4c91-b03c-958189d2c744
Zaccardi, Francesco
8d31a980-3db1-4477-9514-c18087cf886a
Hope, Jonathan
4da0fcde-3c24-4869-b6fb-7c00f04ad4ff
Raleigh, Veena
5d55c9fd-6b9d-492b-952d-2962cd47c6d1
Akbari, Ashley
80b0f5bb-6f36-491d-9725-8fee367e03ff
Islam, Nazrul
e5345196-7479-438f-b4f6-c372d2135586
Yates, Thomas
dce0546a-5b14-41b5-b1a2-b78a9057389b
Murphy, Lisa
bf665b9a-938d-4c53-8de7-25db65766e45
Mateen, Bilal A.
b612fcd5-ca88-4c18-9af7-894fa5edcc5f
Khunti, Kamlesh
d1f24c8b-842c-4b44-a0fc-2af3ac3a458f
Nafilyan, Vahe
542d70f5-e4b4-4b4c-b9f6-5b039999f8e9

Razieh, Cameron, Powell, Bethan, Drummond, Rosemary, Ward, Isobel L., Morgan, Jasper, Glickman, Myer, White, Chris, Zaccardi, Francesco, Hope, Jonathan, Raleigh, Veena, Akbari, Ashley, Islam, Nazrul, Yates, Thomas, Murphy, Lisa, Mateen, Bilal A., Khunti, Kamlesh and Nafilyan, Vahe (2025) Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England. PLoS Medicine, 22 (2), [e1004507]. (doi:10.1371/journal.pmed.1004507).

Record type: Article

Abstract

Background: electronic health records (EHRs) are increasingly used to investigate health inequalities across ethnic groups. While there are some studies showing that the recording of ethnicity in EHR is imperfect, there is no robust evidence on the accuracy between the ethnicity information recorded in various real-world sources and census data.

Methods and findings: we linked primary and secondary care NHS England data sources with Census 2021 data and compared individual-level agreement of ethnicity recording in General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR), Hospital Episode Statistics (HES), Ethnic Category Information Asset (ECIA), and Talking Therapies for anxiety and depression (TT) with ethnicity reported in the census. Census ethnicity is self-reported and, therefore, regarded as the most reliable population-level source of ethnicity recording. We further assessed the impact of multiple approaches to assigning a person an ethnic category. The number of people that could be linked to census from ECIA, GDPPR, HES, and TT were 47.4m, 43.5m, 47.8m, and 6.3m, respectively. Across all 4 data sources, the White British category had the highest level of agreement with census (≥96%), followed by the Bangladeshi category (≥93%). Levels of agreement for Pakistani, Indian, and Chinese categories were ≥87%, ≥83%, and ≥80% across all sources. Agreement was lower for Mixed (≤75%) and Other (≤71%) categories across all data sources. The categories with the lowest agreement were Gypsy or Irish Traveller (≤6%), Other Black (≤19%), and Any Other Ethnic Group (≤25%) categories.

Conclusions: certain ethnic categories across all data sources have high discordance with census ethnic categories. These differences may lead to biased estimates of differences in health outcomes between ethnic groups, a critical data point used when making health policy and planning decisions.

Text
journal.pmed.1004507 - Version of Record
Available under License Creative Commons Attribution.
Download (1MB)

More information

Accepted/In Press date: 2 December 2024
Published date: 26 February 2025

Identifiers

Local EPrints ID: 499699
URI: http://eprints.soton.ac.uk/id/eprint/499699
ISSN: 1549-1277
PURE UUID: b04e5214-ea04-4c2f-afe9-78aa3c8bc891
ORCID for Nazrul Islam: ORCID iD orcid.org/0000-0003-3982-4325

Catalogue record

Date deposited: 01 Apr 2025 16:32
Last modified: 22 Aug 2025 02:37

Export record

Altmetrics

Contributors

Author: Cameron Razieh
Author: Bethan Powell
Author: Rosemary Drummond
Author: Isobel L. Ward
Author: Jasper Morgan
Author: Myer Glickman
Author: Chris White
Author: Francesco Zaccardi
Author: Jonathan Hope
Author: Veena Raleigh
Author: Ashley Akbari
Author: Nazrul Islam ORCID iD
Author: Thomas Yates
Author: Lisa Murphy
Author: Bilal A. Mateen
Author: Kamlesh Khunti
Author: Vahe Nafilyan

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×