The University of Southampton
University of Southampton Institutional Repository

ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history

ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history
ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history
Given the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze and classify clinical notes in electronic health records. One challenge in classifying clinical notes is the sparsity of co-occurrence patterns in short texts. Current approaches for short text classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short texts. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical short text classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a dataset consisting of clinical cardiovascular notes from the Netherlands to test the classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.
Clinical sentence classification, Enriched text representation, Latent Dirichlet allocation, Sentence classification, Short text classification
329-349
Bagheri, A.
8d522dbe-4846-48a4-9302-cee56df42e6b
Sammani, A.
45c7db1b-b39e-45cb-b2cb-fc3a796f9994
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612
Asselbergs, F.W
b51ea3b9-6b09-4dc4-83f4-6d141d74ff3e
Oberski, D.L
cccd0aac-c4d4-41ac-9862-d368e143751b
Bagheri, A.
8d522dbe-4846-48a4-9302-cee56df42e6b
Sammani, A.
45c7db1b-b39e-45cb-b2cb-fc3a796f9994
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612
Asselbergs, F.W
b51ea3b9-6b09-4dc4-83f4-6d141d74ff3e
Oberski, D.L
cccd0aac-c4d4-41ac-9862-d368e143751b

Bagheri, A., Sammani, A., Van Der Heijden, Peter, Asselbergs, F.W and Oberski, D.L (2020) ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history. Journal of Intelligent Information Systems, 55 (2), 329-349. (doi:10.1007/s10844-020-00605-w).

Record type: Article

Abstract

Given the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze and classify clinical notes in electronic health records. One challenge in classifying clinical notes is the sparsity of co-occurrence patterns in short texts. Current approaches for short text classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short texts. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical short text classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a dataset consisting of clinical cardiovascular notes from the Netherlands to test the classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.

Text
ETM Enrichment by Topic Modeling for Automated Clinical Sentence._ - Accepted Manuscript
Download (2MB)
Text
JIIS-D-19-00360_R2_Accepted - Accepted Manuscript
Restricted to Repository staff only
Request a copy
Text
Bagheri 2020 Article ETM Enrichment By Topic Modeling - Version of Record
Available under License Creative Commons Attribution.
Download (957kB)

More information

Accepted/In Press date: 14 April 2020
e-pub ahead of print date: 28 April 2020
Published date: October 2020
Keywords: Clinical sentence classification, Enriched text representation, Latent Dirichlet allocation, Sentence classification, Short text classification

Identifiers

Local EPrints ID: 439589
URI: http://eprints.soton.ac.uk/id/eprint/439589
PURE UUID: bf2acb51-ea60-44fb-9adf-8b96aad3b491
ORCID for Peter Van Der Heijden: ORCID iD orcid.org/0000-0002-3345-096X

Catalogue record

Date deposited: 27 Apr 2020 16:46
Last modified: 28 Apr 2022 04:34

Export record

Altmetrics

Contributors

Author: A. Bagheri
Author: A. Sammani
Author: F.W Asselbergs
Author: D.L Oberski

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×