The University of Southampton
University of Southampton Institutional Repository

ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history

ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history
ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history

Given the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze clinical notes and classify their sentences in electronic health records (EHRs). This study uses EHR texts to detect patients’ disease history from clinical sentences. However, in EHRs, sentences are less topic-focused and shorter than that in general domain, which leads to the sparsity of co-occurrence patterns and the lack of semantic features. To tackle this challenge, current approaches for clinical sentence classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short sentences. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical sentence classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a data set consisting of clinical cardiovascular notes from the Netherlands to test the sentence classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.

Clinical sentence classification, Enriched text representation, Latent Dirichlet allocation, Sentence classification, Short text classification
329-349
Bagheri, A.
8d522dbe-4846-48a4-9302-cee56df42e6b
Sammani, A.
45c7db1b-b39e-45cb-b2cb-fc3a796f9994
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612
Asselbergs, F.W
b51ea3b9-6b09-4dc4-83f4-6d141d74ff3e
Oberski, D.L
cccd0aac-c4d4-41ac-9862-d368e143751b
Bagheri, A.
8d522dbe-4846-48a4-9302-cee56df42e6b
Sammani, A.
45c7db1b-b39e-45cb-b2cb-fc3a796f9994
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612
Asselbergs, F.W
b51ea3b9-6b09-4dc4-83f4-6d141d74ff3e
Oberski, D.L
cccd0aac-c4d4-41ac-9862-d368e143751b

Bagheri, A., Sammani, A., Van Der Heijden, Peter, Asselbergs, F.W and Oberski, D.L (2020) ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history. Journal of Intelligent Information Systems, 55 (2), 329-349. (doi:10.1007/s10844-020-00605-w).

Record type: Article

Abstract

Given the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze clinical notes and classify their sentences in electronic health records (EHRs). This study uses EHR texts to detect patients’ disease history from clinical sentences. However, in EHRs, sentences are less topic-focused and shorter than that in general domain, which leads to the sparsity of co-occurrence patterns and the lack of semantic features. To tackle this challenge, current approaches for clinical sentence classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short sentences. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical sentence classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a data set consisting of clinical cardiovascular notes from the Netherlands to test the sentence classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.

Text
ETM Enrichment by Topic Modeling for Automated Clinical Sentence._ - Accepted Manuscript
Download (2MB)
Text
JIIS-D-19-00360_R2_Accepted - Accepted Manuscript
Restricted to Repository staff only
Request a copy
Text
Bagheri 2020 Article ETM Enrichment By Topic Modeling - Version of Record
Available under License Creative Commons Attribution.
Download (957kB)

More information

Accepted/In Press date: 14 April 2020
e-pub ahead of print date: 28 April 2020
Published date: 1 October 2020
Additional Information: Publisher Copyright: © 2020, The Author(s).
Keywords: Clinical sentence classification, Enriched text representation, Latent Dirichlet allocation, Sentence classification, Short text classification

Identifiers

Local EPrints ID: 439589
URI: http://eprints.soton.ac.uk/id/eprint/439589
PURE UUID: bf2acb51-ea60-44fb-9adf-8b96aad3b491
ORCID for Peter Van Der Heijden: ORCID iD orcid.org/0000-0002-3345-096X

Catalogue record

Date deposited: 27 Apr 2020 16:46
Last modified: 27 Sep 2022 04:01

Export record

Altmetrics

Contributors

Author: A. Bagheri
Author: A. Sammani
Author: F.W Asselbergs
Author: D.L Oberski

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×