ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history
ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history
Given the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze clinical notes and classify their sentences in electronic health records (EHRs). This study uses EHR texts to detect patients’ disease history from clinical sentences. However, in EHRs, sentences are less topic-focused and shorter than that in general domain, which leads to the sparsity of co-occurrence patterns and the lack of semantic features. To tackle this challenge, current approaches for clinical sentence classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short sentences. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical sentence classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a data set consisting of clinical cardiovascular notes from the Netherlands to test the sentence classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.
Clinical sentence classification, Enriched text representation, Latent Dirichlet allocation, Sentence classification, Short text classification
329-349
Bagheri, A.
8d522dbe-4846-48a4-9302-cee56df42e6b
Sammani, A.
45c7db1b-b39e-45cb-b2cb-fc3a796f9994
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612
Asselbergs, F.W
b51ea3b9-6b09-4dc4-83f4-6d141d74ff3e
Oberski, D.L
cccd0aac-c4d4-41ac-9862-d368e143751b
1 October 2020
Bagheri, A.
8d522dbe-4846-48a4-9302-cee56df42e6b
Sammani, A.
45c7db1b-b39e-45cb-b2cb-fc3a796f9994
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612
Asselbergs, F.W
b51ea3b9-6b09-4dc4-83f4-6d141d74ff3e
Oberski, D.L
cccd0aac-c4d4-41ac-9862-d368e143751b
Bagheri, A., Sammani, A., Van Der Heijden, Peter, Asselbergs, F.W and Oberski, D.L
(2020)
ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history.
Journal of Intelligent Information Systems, 55 (2), .
(doi:10.1007/s10844-020-00605-w).
Abstract
Given the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze clinical notes and classify their sentences in electronic health records (EHRs). This study uses EHR texts to detect patients’ disease history from clinical sentences. However, in EHRs, sentences are less topic-focused and shorter than that in general domain, which leads to the sparsity of co-occurrence patterns and the lack of semantic features. To tackle this challenge, current approaches for clinical sentence classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short sentences. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical sentence classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a data set consisting of clinical cardiovascular notes from the Netherlands to test the sentence classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.
Text
ETM Enrichment by Topic Modeling for Automated Clinical Sentence._
- Accepted Manuscript
Text
JIIS-D-19-00360_R2_Accepted
- Accepted Manuscript
Restricted to Repository staff only
Request a copy
Text
Bagheri 2020 Article ETM Enrichment By Topic Modeling
- Version of Record
More information
Accepted/In Press date: 14 April 2020
e-pub ahead of print date: 28 April 2020
Published date: 1 October 2020
Additional Information:
Publisher Copyright:
© 2020, The Author(s).
Keywords:
Clinical sentence classification, Enriched text representation, Latent Dirichlet allocation, Sentence classification, Short text classification
Identifiers
Local EPrints ID: 439589
URI: http://eprints.soton.ac.uk/id/eprint/439589
PURE UUID: bf2acb51-ea60-44fb-9adf-8b96aad3b491
Catalogue record
Date deposited: 27 Apr 2020 16:46
Last modified: 17 Mar 2024 05:29
Export record
Altmetrics
Contributors
Author:
A. Bagheri
Author:
A. Sammani
Author:
F.W Asselbergs
Author:
D.L Oberski
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics