The University of Southampton
University of Southampton Institutional Repository

Applying machine-learning to rapidly analyse large qualitative text datasets to inform the COVID-19 pandemic response

Applying machine-learning to rapidly analyse large qualitative text datasets to inform the COVID-19 pandemic response
Applying machine-learning to rapidly analyse large qualitative text datasets to inform the COVID-19 pandemic response
Background: machine-assisted topic analysis (MATA) uses artificial intelligence methods to assist qualitative researchers to analyse large amounts of textual data. This could allow qualitative researchers to inform and update public health interventions ‘in real-time’, to ensure they remain acceptable and effective during rapidly changing contexts (such as a pandemic). In this novel study we aimed to understand the potential for such approaches to support intervention implementation, by directly comparing MATA and ‘human-only’ thematic analysis techniques when applied to the same dataset (1472 free-text responses from users of the COVID-19 infection control intervention ‘Germ Defence’). 
Methods: in MATA, the analysis process included an unsupervised topic modelling approach to identify latent topics in the text. The human research team then described the topics and identified broad themes. In human-only codebook analysis, an initial codebook was developed by an experienced qualitative researcher and applied to the dataset by a well-trained research team, who met regularly to critique and refine the codes. To understand similarities and difference, formal triangulation using a ‘convergence coding matrix’ compared the findings from both methods, categorising them as ‘agreement’, ‘complementary’, ‘dissonant’, or ‘silent’. 
Results: human analysis took much longer (147.5 hours) than MATA (40 hours). Both human-only and MATA identified key themes about what users found helpful and unhelpful (e.g. Boosting confidence in how to perform the behaviours vs Lack of personally relevant content ). Formal triangulation of the codes created showed high similarity between the findings. All codes developed from the MATA were classified as in agreement or complementary to the human themes. Where the findings were classified as complementary, this was typically due to slightly differing interpretations or nuance present in the human-only analysis. 
Conclusions: overall, the quality of MATA was as high as the human-only thematic analysis, with substantial time savings. For simple analyses that do not require an in-depth or subtle understanding of the data, MATA is a useful tool that can support qualitative researchers to interpret and analyse large datasets quickly. These findings have practical implications for intervention development and implementation, such as enabling rapid optimisation during public health emergencies. Contributions to the literature Natural language processing (NLP) techniques have been applied within health research due to the need to rapidly analyse large samples of qualitative data. However, the extent to which these techniques lead to results comparable to human coding requires further assessment. We demonstrate that combining NLP with human analysis to analyse free-text data can be a trustworthy and efficient method to use on large quantities of qualitative data. This method has the potential to play an important role in contexts where rapid descriptive or exploratory analysis of very large datasets is required, such as during a public health emergency.
Acknowledgements
We would like to thank our voluntary research assistants; Benjamin Gruneberg, Lillian Brady, Georgia Farrance, Lucy Sellors, Kinga Olexa, and Zeena Abdelrazig for their valuable contribution to the coding of the data for the human-only analysis. We would also like to acknowledge Katherine Morton’s contribution to the administration of survey, and James Denison-Day for the construction and maintenance of the Germ Defence website.
Publication references - 26
Show all
Sorted by: Date
Developing and testing an automated qualitative assistant (AQUA) to support qualitative analysis
Robert P Lennon, Robbie Fraleigh, Lauren J Van Scoy, Aparna Keshaviah, Xindi C Hu, Bethany L Snyder, Erin L Miller, William A Calo, Aleksandra E Zgierska, Christopher Griffin
2021, Family Medicine and Community Health - Article
2
2 total citations on Dimensions.
Article has an altmetric score of 4
View PDFAdd to Library
Accelerating Mixed Methods Research With Natural Language Processing of Big Text Data
Tammy Chang, Melissa DeJonckheere, V. G. Vinod Vydiswaran, Jiazhao Li, Lorraine R. Buis, Timothy C. Guetterman
2021, Journal of Mixed Methods Research - Article
8
8 total citations on Dimensions.
Article has an altmetric score of 12
Add to Library
Adapting Behavioral Interventions for a Changing Public Health Context: A Worked Example of Implementing a Digital Intervention During a Global Pandemic Using Rapid Optimisation Methods
Katherine Morton, Ben Ainsworth, Sascha Miller, Cathy Rice, Jennifer Bostock, James Denison-Day, Lauren Towler, Julia Groot, Michael Moore, Merlin Willcox, Tim Chadborn, Richard Amlot, Natalie Gold, Paul Little, Lucy Yardley
2021, Frontiers in Public Health - Article
11
11 total citations on Dimensions.
Article has an altmetric score of 5
View PDFAdd to Library
Infection Control Behavior at Home During the COVID-19 Pandemic: Observational Study of a Web-Based Behavioral Intervention (Germ Defence)
Ben Ainsworth, Sascha Miller, James Denison-Day, Beth Stuart, Julia Groot, Cathy Rice, Jennifer Bostock, Xiao-Yang Hu, Katherine Morton, Lauren Towler, Michael Moore, Merlin Willcox, Tim Chadborn, Natalie Gold, Richard Amlôt, Paul Little, Lucy Yardley
2021, Journal of Medical Internet Research - Article
10
10 total citations on Dimensions.
Article has an altmetric score of 61
View PDFAdd to Library
Carrying Out Rapid Qualitative Research During a Pandemic: Emerging Lessons From COVID-19
Cecilia Vindrola-Padros, Georgia Chisnall, Silvie Cooper, Anna Dowrick, Nehla Djellouli, Sophie Mulcahy Symmons, Sam Martin, Georgina Singleton, Samantha Vanderslott, Norha Vera, Ginger A. Johnson
2020, Qualitative Health Research - Article
197
197 total citations on Dimensions.
Article has an altmetric score of 60
View PDFAdd to Library
© 2022 Digital Science & Research Solutions, Inc. All Rights Reserved | About Dimensions · Privacy policy ·
· Legal terms · VPAT ®
medRxiv
Towler, Lauren
ebb4fb4e-703f-4e52-a9dc-53e72ca68e8f
Bondaronek, Paulina
315e63f0-9b9c-451a-87ae-736c663e08ca
Papakonstantinou, Trisevgeni
6e39c90c-6cf8-4311-8b5f-a7bcb2a37141
Amlôt, Richard
d93f5263-ea24-4b12-b505-f51694220b8e
Chadborn, Tim
fb42e42c-cac4-46bc-8f4f-07844add4d93
Ainsworth, Ben
b02d78c3-aa8b-462d-a534-31f1bf164f81
Yardley, Lucy
64be42c4-511d-484d-abaa-f8813452a22e
Towler, Lauren
ebb4fb4e-703f-4e52-a9dc-53e72ca68e8f
Bondaronek, Paulina
315e63f0-9b9c-451a-87ae-736c663e08ca
Papakonstantinou, Trisevgeni
6e39c90c-6cf8-4311-8b5f-a7bcb2a37141
Amlôt, Richard
d93f5263-ea24-4b12-b505-f51694220b8e
Chadborn, Tim
fb42e42c-cac4-46bc-8f4f-07844add4d93
Ainsworth, Ben
b02d78c3-aa8b-462d-a534-31f1bf164f81
Yardley, Lucy
64be42c4-511d-484d-abaa-f8813452a22e

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

Background: machine-assisted topic analysis (MATA) uses artificial intelligence methods to assist qualitative researchers to analyse large amounts of textual data. This could allow qualitative researchers to inform and update public health interventions ‘in real-time’, to ensure they remain acceptable and effective during rapidly changing contexts (such as a pandemic). In this novel study we aimed to understand the potential for such approaches to support intervention implementation, by directly comparing MATA and ‘human-only’ thematic analysis techniques when applied to the same dataset (1472 free-text responses from users of the COVID-19 infection control intervention ‘Germ Defence’). 
Methods: in MATA, the analysis process included an unsupervised topic modelling approach to identify latent topics in the text. The human research team then described the topics and identified broad themes. In human-only codebook analysis, an initial codebook was developed by an experienced qualitative researcher and applied to the dataset by a well-trained research team, who met regularly to critique and refine the codes. To understand similarities and difference, formal triangulation using a ‘convergence coding matrix’ compared the findings from both methods, categorising them as ‘agreement’, ‘complementary’, ‘dissonant’, or ‘silent’. 
Results: human analysis took much longer (147.5 hours) than MATA (40 hours). Both human-only and MATA identified key themes about what users found helpful and unhelpful (e.g. Boosting confidence in how to perform the behaviours vs Lack of personally relevant content ). Formal triangulation of the codes created showed high similarity between the findings. All codes developed from the MATA were classified as in agreement or complementary to the human themes. Where the findings were classified as complementary, this was typically due to slightly differing interpretations or nuance present in the human-only analysis. 
Conclusions: overall, the quality of MATA was as high as the human-only thematic analysis, with substantial time savings. For simple analyses that do not require an in-depth or subtle understanding of the data, MATA is a useful tool that can support qualitative researchers to interpret and analyse large datasets quickly. These findings have practical implications for intervention development and implementation, such as enabling rapid optimisation during public health emergencies. Contributions to the literature Natural language processing (NLP) techniques have been applied within health research due to the need to rapidly analyse large samples of qualitative data. However, the extent to which these techniques lead to results comparable to human coding requires further assessment. We demonstrate that combining NLP with human analysis to analyse free-text data can be a trustworthy and efficient method to use on large quantities of qualitative data. This method has the potential to play an important role in contexts where rapid descriptive or exploratory analysis of very large datasets is required, such as during a public health emergency.
Acknowledgements
We would like to thank our voluntary research assistants; Benjamin Gruneberg, Lillian Brady, Georgia Farrance, Lucy Sellors, Kinga Olexa, and Zeena Abdelrazig for their valuable contribution to the coding of the data for the human-only analysis. We would also like to acknowledge Katherine Morton’s contribution to the administration of survey, and James Denison-Day for the construction and maintenance of the Germ Defence website.
Publication references - 26
Show all
Sorted by: Date
Developing and testing an automated qualitative assistant (AQUA) to support qualitative analysis
Robert P Lennon, Robbie Fraleigh, Lauren J Van Scoy, Aparna Keshaviah, Xindi C Hu, Bethany L Snyder, Erin L Miller, William A Calo, Aleksandra E Zgierska, Christopher Griffin
2021, Family Medicine and Community Health - Article
2
2 total citations on Dimensions.
Article has an altmetric score of 4
View PDFAdd to Library
Accelerating Mixed Methods Research With Natural Language Processing of Big Text Data
Tammy Chang, Melissa DeJonckheere, V. G. Vinod Vydiswaran, Jiazhao Li, Lorraine R. Buis, Timothy C. Guetterman
2021, Journal of Mixed Methods Research - Article
8
8 total citations on Dimensions.
Article has an altmetric score of 12
Add to Library
Adapting Behavioral Interventions for a Changing Public Health Context: A Worked Example of Implementing a Digital Intervention During a Global Pandemic Using Rapid Optimisation Methods
Katherine Morton, Ben Ainsworth, Sascha Miller, Cathy Rice, Jennifer Bostock, James Denison-Day, Lauren Towler, Julia Groot, Michael Moore, Merlin Willcox, Tim Chadborn, Richard Amlot, Natalie Gold, Paul Little, Lucy Yardley
2021, Frontiers in Public Health - Article
11
11 total citations on Dimensions.
Article has an altmetric score of 5
View PDFAdd to Library
Infection Control Behavior at Home During the COVID-19 Pandemic: Observational Study of a Web-Based Behavioral Intervention (Germ Defence)
Ben Ainsworth, Sascha Miller, James Denison-Day, Beth Stuart, Julia Groot, Cathy Rice, Jennifer Bostock, Xiao-Yang Hu, Katherine Morton, Lauren Towler, Michael Moore, Merlin Willcox, Tim Chadborn, Natalie Gold, Richard Amlôt, Paul Little, Lucy Yardley
2021, Journal of Medical Internet Research - Article
10
10 total citations on Dimensions.
Article has an altmetric score of 61
View PDFAdd to Library
Carrying Out Rapid Qualitative Research During a Pandemic: Emerging Lessons From COVID-19
Cecilia Vindrola-Padros, Georgia Chisnall, Silvie Cooper, Anna Dowrick, Nehla Djellouli, Sophie Mulcahy Symmons, Sam Martin, Georgina Singleton, Samantha Vanderslott, Norha Vera, Ginger A. Johnson
2020, Qualitative Health Research - Article
197
197 total citations on Dimensions.
Article has an altmetric score of 60
View PDFAdd to Library
© 2022 Digital Science & Research Solutions, Inc. All Rights Reserved | About Dimensions · Privacy policy ·
· Legal terms · VPAT ®

Text
2022.05.12.22274993v2.full - Author's Original
Available under License Creative Commons Attribution.
Download (251kB)

More information

Published date: 1 June 2022

Identifiers

Local EPrints ID: 474319
URI: http://eprints.soton.ac.uk/id/eprint/474319
PURE UUID: 896ba92d-e210-48af-94b0-0d46f8b420ce
ORCID for Lauren Towler: ORCID iD orcid.org/0000-0002-6597-0927
ORCID for Ben Ainsworth: ORCID iD orcid.org/0000-0002-5098-1092
ORCID for Lucy Yardley: ORCID iD orcid.org/0000-0002-3853-883X

Catalogue record

Date deposited: 17 Feb 2023 17:58
Last modified: 24 Apr 2024 01:52

Export record

Altmetrics

Contributors

Author: Lauren Towler ORCID iD
Author: Paulina Bondaronek
Author: Trisevgeni Papakonstantinou
Author: Richard Amlôt
Author: Tim Chadborn
Author: Ben Ainsworth ORCID iD
Author: Lucy Yardley ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×